Surprising applications of a simple Graph Concept
This article is a riff off of Mark Newman’s Textbook Networks, and its section on the Graph Laplacian. Here I will walk you through the concept and demonstrate its utility with some coding experiments.
Warning: The following contains Eigenvalues and Eigenvectors, If you are unfamiliar here’s a primer :) https://medium.com/fintechexplained/what-are-eigenvalues-and-eigenvectors-a-must-know-concept-for-machine-learning-80d0fd330e47
The Graph Laplacian
The Laplacian is a commonly used tool in the study of Networks. Its a matrix whose values represent the connections of the Network it represents. Like the Adjacency matrix, it is easier for a computer to work with an array of numbers than with shapes.
The formula for the Laplacian is the Degree matrix D minus the adjacency matrix A, L=D-A. Each diagonal element is the degree of the corresponding node. The off-diagonal elements in a row are either 0 or -1, for unconnected pairs and connected pairs respectively. These properties lead to the corollary that the sum of terms in a row (or column) will always be 0.
Since A and D are both symmetric real-valued matrices, L is also symmetric and therefore has no negative eigenvalues. If we add the fact that L has eigenvalues of value 0 for every connected component, then we can conclude that the smallest eigenvalues are all 0.
To tie things back together, the sum of row elements equaling 0 and having eigenvalues of 0 imply that the vector of all ones 1 is always an eigenvector of L with eigenvalue 0. This consequence becomes useful in many applications of the Laplacian, and will show up again in my Diffusion experiment.
Many applications of L are concerned with the optimization of some scalar value, usually of the form R=xTLx . Values of this form are subject to eigenvalue analysis, since if x is an eigenvector of L then R=λ|x|². Using this property we can easily find minimal or maximal values of R by selecting the smallest or largest eigenvalues and their corresponding eigenvectors.
Much of L’s utility derives from the fact that the elements of Lx are the sum of the differences of the variable x between a node and its adjacent neighbors. For example in partitioning and visualization when we choose to minimize R, nodes densely connected together will have x values very close to one another. The opposite is true for values of x in two clusters with sparse connections to one another, the average value of x in each of these clusters will be separated from each other.
For a final example take the network above, the blue node at the top is connected to the three yellow nodes. The first row of the Laplacian will be [3,-1,-1,-1,0,0,0] corresponding to the degree of the blue node and the -1’s for each connection to the yellow nodes. If we multiplied L by a variable x, the first element will be 3x_1-x_2-x_3-x_4 which can be rewritten as (x_1-x_2)+(x_1-x_3)+(x_1-x_4).
Partitioning a Network
Whenever you are working with large network datasets, its often a good idea to check out the set of clusters the network breaks up into. Network partitioning is about finding a good labelling for nodes. A ‘good’ Partition of a network consists of two constraints; giving the same label to nodes that a group is highly connected to and assigning different labels to nodes that are barely connected to a group.
Spectral Partitioning is a technique for assigning group labels to nodes based on the smallest non-zero eigenvalue’s given eigenvector (the Fiedler Eigenvector). It consists of only three simple steps:
- Construct your Laplacian and calculate its Eigenvalues and Eigenvectors
- Using the Eigenvectors, that correspond to the smallest non-zero eigenvalues, as the coordinates of the nodes.
- Run K-Means clustering on those node coordinates
As an example I made a network of three major clusters with sparse connections between different clusters.
After making the network’s Laplacian and calculating it’s eigenvalues, I found the Fiedler eigenvector and plot it below.
Notice how nodes 0 through 9 are in the range between 0.25 and 0.3, 10 through 29 are around -0.1, and 30 through 50 are around -0.02. The sparse connections between the clusters result in the bridge nodes having values that are pulled toward the center, like nodes; 0, 2, 3, 4, 15, 24, and 35.
Now we can use the values of the Fiedler eigenvector as coordinates in a K-Means clustering algorithm to label our clusters.
After training the KMeans clustering model achieved 100% accuracy in labelling the given network.
Note: Newman presents a binary partition problem, in which the nodes are only assigned to two classes.
Gradient Descent Spectral Layout: The spectral layout of a graph is a visualization method that uses the values of eigenvectors as the coordinates for the nodes in the frame.
In this little experiment I use a variant of the spectral layout to place nodes. The layout requires the minimization of Δ² =xTLx, so I cheated and used that objective to make a gradient descent version. I use the formula x’=x-α∇(Δ²) to approach the minimum value of Δ². And with a wave of my magic calculus wand… poof.. x’=x-2Lx will be the equation I use to iterate to new points.
Starting with random placements of nodes we get the ugly mess above. After 1000 iterations we get a much cleaner looking graph.
Spectral Layout: Uses the eigenvectors for the two smallest non-zero eigenvalues as the coordinates for your layout.
The standard Spectral layout uses the global minimum values of Δ² to determine the coordinates of nodes.
See how easily separated each group of nodes is in the figure below. That separability is what makes this type of analysis useful for seeing clusters and this forms the basis of the Spectral Partitioning we talked about before.
Notice that the gradient descent layout and the standard Spectral layout differ in much of their placements. However you should also see that they have the same features of clusters stretched along lines and branching into other clusters. The difference is a consequence of the fact that my gradient descent layout will converge on the local (not global) minimum of Δ². It effectively singles out the closest pair of L’s eigenvectors and uses those as coordinates instead of finding the global minimum values.
Random Walks and Diffusion on Networks
As my last example I’ll use the Laplacian to derive the likelihood of a random walk ending on a given node. A random walk on a network consists of an agent starting on one node and randomly walking to one of its neighbors.
Let’s start with the Karate Club as our example Network. What we will do is initialize our variable vector x with the value 1 at node 16 (which was chosen because it was peripheral).
The vector x’s value will diffuse through the network step by step via the equation x’ = AD^-1x. Each iteration the values of x converge on the stable distribution on the network, which we can find by solving another eigenvector problem.
The Eigenvector Distribution: The stable distribution of x over the network can be derived from the equation x=AD^-1x,
which by moving the terms on the right over we get x-AD^-1x=0,
pulling out the Identity matrix Ix-AD^-1x=(I-AD^-1)x=(D-A)D^-1x=0
and voila… LD^-1x=0 another eigenvalue Problem with the Laplacian!
Networks by Mark Newman
Introduction to the Modeling and Analysis of Complex Systems by Hiroki Sayama