Real world is not a series of rows with their attributes as columns. It is chaotic and unimaginable to naked eye. In my previous article “Power of Graph Technology in AI landscape”, I had quoted some of the predictions from Gartner and other key reports, on how Graph Technologies will be used more and more in our connected world and how they are going to power-up our AI landscape. Google, LinkedIn, Walmart, Facebook, all these top organizations make use of Graph Technologies behind the scenes and give us the best of services. The demand for Graph technology usage will only multiply in the coming years because of their easy application in our model building to derive better, faster, and smarter results
SwissCognitive Guest Blogger: Vidhya Veeraraghavan – Vice President, Artificial Intelligence Delivery Lead – Financial Crime & Fraud Detection – “Fantastic Graph Algorithms and how to use them in AI models”
In this article we are going to navigate through some of the Fantastic Graph algorithms and how to use them in our AI models.
Let us begin.
Why should we really care about Graph Algorithms?
Graph Algorithms are a set of instructions or analytical tools, that traverse a graph to determine strength and relations between the entities present in a graph, i.e. the nodes and the edges. Graph algorithms are used to help make sense of connected data. Believe it or not, we are all part of some networks, be it the social network or the computer networks or the protein networks in our blood or the nervous system within our body, everything is connected. Understanding these networks and connections have become important because they help uncover the incredible potential of insights and in turn innovations. Graph Algorithms have the unique capabilities to understand the structures and reveal patterns within these connections and networks. Particularly, these algorithms come in very handy to enhance our AI models.
Let me walk you through some of these Graph Algorithms in the context of Artificial Intelligence models.
Fantastic Graph Algorithms & their usage in AI models
Our AI models are expected to make choices as we humans do. Humans use contextual information to make better decisions and apply the earned knowledge in new situations. Context helps us improve predictions. We know AI is good at specific, well-defined tasks, but struggles with ambiguity. Graph-infused ML can help add the contextual information which is important to better decisions. People are influenced by their social networks. So understanding these networks turns out to be essential for our AI models to work better.
As we have seen so far, Graph algorithms are primarily used for exploring connected data. There are 3 areas of analysis that are at the heart of the graph algorithms. Pathfinding, Centrality and Community Detection. In the remaining part of the article, we will see the popular algorithms under these areas of analysis and how these algorithms can be leveraged by our AI models.
Graphs can be explored in two ways, a general discovery, or a specific & explicit search. Breadth First Search (BFS) and Depth First Search (DFS) are fundamental for traversing graphs. Pathfinding algorithms along with Graph search algorithms (BFS & DFS) are used to identify optimal routes through graphs by exploring routes between nodes, starting at one node, and traversing through all the relationships until the destination node. Pathfinding algorithms are popularly used in Google Maps, GPS tracking, LinkedIn connection suggestions, Walmart store aisle hops etc.
The most popular Pathfinding algorithm is Dijkstra’s graph search algorithm can be used in many AI applications, such as game simulations, search engines, GPS tracking etc. This algorithm finds the shortest path between two nodes in a graph. It is an iterative algorithm that starts with the source node and works its way to the destination node. For each new node discovered, Dijkstra’s algorithm calculates the shortest path to the destination node using the currently known distances. When traversing using Dijkstra’s algorithm, any node in the graph can be considered the root node. Minimum Spanning Tree (MST) is another algorithm which is used for pathfinding analysis. It is quite useful in designing networks, in telecom, computer and electrical grids. This algorithm can be used in AI models built for Image Segmentation, where we first construct an MST on a graph where pixels are nodes and distances between pixels are based on some similarity measure (color, intensity, etc.). There are few more algorithms under Pathfinding which are also used in AI use-cases. You can have a closer look at them in this library link from O’Reilly.
Centrality algorithms find the important vertices (or the influencer nodes) in a graph, based on their connections with other vertices in that network. Indicators of centrality assign scores or rankings to vertices and edges within a graph corresponding to their network position, based on how close they are to the centre of connection-based activity. Centrality measures are also effective as a comparison factor with other domain-specific metrics and can be used as a bridge to other areas. Centrality algorithms are popularly used in Social Network analysis, Game theory, Computer Vision, Network Security, and resource optimization problems.
Most frequently used Centrality algorithms are PageRank and Betweenness & Degree Centrality. PageRank algorithm measures the importance of each node within the graph, based on the number incoming links and the importance of the corresponding source nodes. The underlying assumption is that a page is only as important as the pages that link to it. PageRank algorithm is the Power of Google search (you can read more about this in The original google paper). The mathematics of PageRank are entirely general and apply to any graph or network in any domain. It can also be used in building AI models for link prediction and recommendation engines. Betweenness Centrality is a way of detecting the amount of influence a node has over the flow of information or resources in a graph. It can be used to find nodes that serve as bridges from one part of a graph to another. Highly used in the context of identifying the influencers both in legitimate and criminal organizations. Influencers in an organization need not always be in management position. Identifying Fraud-ring masters and the depth of their influence becomes critical in Money Laundering problems to follow the money. Degree Centrality algorithm on the other hand counts the number of incoming and outgoing relationships from a node. It is used to find popular nodes in a graph, for eg: identifying the social-media influencers who can be targeted for promoting specific products or bad actors on an online auction site or stock exchange brokers who collude with other bad actors artificially increasing the prices. Centrality in these 2 examples will be higher for the influencers and the bad actors than the others.
Identifying communities inside a network can prove to be very important esp. when you want to discover group behaviours or emerging trends. The basic idea is that members inside a cluster or a community will have more relationships within the group than with nodes outside of their cluster. Community detection is specially tailored for network analysis which depends on a single attribute type (edges) unlike Clustering (ML technique) which includes multiple attributes to determine the clusters.
One of the popular and scalable for Community Detection is Louvain modularity algorithm primarily used to partition the large networks into clusters. It is a hierarchical clustering algorithm that recursively merges communities into a single node and executes the modularity clustering on the condensed graphs using greedy optimization method. It is fast algorithm with rapid convergence and brings high modularity output (i.e. better communities). This algorithm can be quite handy in Fraud analysis, when you want to know whether a group has just a few discrete bad behaviours or is acting as a ‘Fraud Ring’. Label Propagation is another popular Community Detection fast algorithm which infers clusters by spreading labels based on neighbourhood majorities. The intuition behind the algorithm is that a single label can quickly become dominant in a densely connected group of nodes but will have trouble crossing a sparsely connected region. An interesting Natural Language Processing (NLP) use case can be found on Twitter polarity classification with label propagation. In this digital age where customers demand hyper-personalization as a minimum requirement, Strongly Connected Components algorithm could be one of the favourites among models built for product recommendations. This algorithm finds groups where each node is reachable from every other node in that same group following the direction of relationships, making product recommendations based on group affiliation or similar items easier and more effective.
In this article, we checked out some fantastic Graph algorithms and why should we really care about them so much. These algorithms are fundamentals to Graph Technology which is gaining prominence across domains solving some critical problems such as supply chain management, fraud detection, anti-money laundering, network security and more. In the practical use-cases provided in this article we saw the role that these algorithms can play in enhancing our AI models in computer vision, NLP, and network analysis. Unquestionably, Graph technology will continue to boost our AI models in this connected world.
About the Author:
Vidhya Veeraraghavan is a Data Evangelist, Keynote Speaker and Thought Leader based in Bangalore, India. She has won many leadership awards and is well-respected in the Data community. She is a hands-on problem solver delivering AI and ML projects at all levels of business making significant strategic benefits to organizations across the globe. With her contribution and her ability to build, shape, lead, inspire, coach & mentor high-energy teams, Vidhya has proved to be an inspiration to all.