2024-08-19
한어Русский языкEnglishFrançaisIndonesianSanskrit日本語DeutschPortuguêsΕλληνικάespañolItalianoSuomalainenLatina
Machine Heart Report
Editor: Jiaqi
The matrix is difficult to understand, but it may be different if you change your perspective.
When learning mathematics, we are often frustrated by the difficulty and abstractness of the knowledge we learn; but sometimes, just by changing our perspective, we can find a simple and intuitive solution to the problem. For example, when we were young and learned the formula of the sum squared (a+b)², we might not understand why it is equal to a²+2ab+b². We just knew that it was written in the book and the teacher told us to remember it like this; until one day we saw this animated picture:
Suddenly it dawned on me that we can understand it from a geometric perspective!
Now, this feeling of sudden enlightenment comes again: non-negative matrices can be equivalently converted into corresponding directed graphs!
As shown in the figure below, the 3×3 matrix on the left can actually be equivalently represented as a directed graph with three nodes on the right, and this representation is very helpful for both matrix and graph theory.
This example comes from Tivadar Danka, a mathematician who is committed to making math accessible for everyone. This mathematician, who calls himself "chaotic good", vividly introduces this equivalence of matrices and graphs and their uses through a series of tweets and blog posts. So far, these tweets have been read more than 2 million times, retweeted more than 3,200 times, and collected 9,100 times.
The equivalence between matrices and directed graphs
As shown in the example above, if we regard each row as a node, each element can be represented as a directed and weighted edge. Of course, 0 elements can be ignored. If the element is located in the i-th row and the j-th column, it corresponds to an edge from node i to node j.
At first glance this may seem complicated, but we can start by looking at one of the nodes.
As shown in the figure, for this 3×3 matrix, the first row corresponds to the topmost node (here we call it node 1), which contains 3 elements but one of them is 0, so two edges extend from this node. The yellow edge represents the element 0.5 at (1,1), so it is a directed edge pointing to itself with a weight of 0.5. Similarly, the blue edge is an edge pointing to node 2 with a weight of 1.
In this way, we can analyze that the i-th column of the matrix corresponds to all edges pointing to node i.
What is the use of this equivalence representation?
This equivalence between non-negative matrices and directed graphs can help us better understand matrices and their operations and simplify some calculations; in turn, this can also help us understand graphs from a new perspective.
For example, powers of a matrix correspond to walks in a graph.
As shown in the figure above, for the kth power of the n×n square matrix A, the summation process of each element will include all possible k-step walks.
For example, suppose we want to calculate the square of the above 3×3 matrix.
If we use matrix multiplication, we need to calculate like this:
For the first element of the operation result, we can get the result = 0.5×0.5+1×0.2+0×1.8 = 0.45. Finally, we can get the complete result:
However, if we use the above graph walk method, we can get the result through the walk path. Similarly, for the first element of the result matrix, we need to sum all the 2-step walk paths that meet a_{1,l}→a_{l,1}.
However, if this directed graph represents the state of a Markov chain, the square of its transition probability matrix essentially represents the probability of the chain reaching a certain state after 2 steps.
Not only that, representing matrices with graphs also gives us a deeper understanding of the structure of non-negative matrices. To this end, Danka said we need to first understand the concept of "strongly connected components".
Strongly connected components
What is a strongly connected component? For a directed graph, if every node in the graph can be reached from every other node, we say that the graph is strongly connected. As shown in the figure below.
A strongly connected component refers to a part/subgraph in a directed graph that can achieve strong connectivity. As shown in the figure below, there is a strongly connected component on each side, and the white edge in the middle does not belong to any strongly connected component.
The following figure shows another example, where the yellow part is the strongly connected component:
The matrix corresponding to the strongly connected graph is irreducible, while all other matrices in the non-negative matrix are reducible.
Danka explains this with an example. (For simplicity, all weights in the examples are cell weights, but in practice these weights can be any non-negative value.)
Let's transcribe this graph, which contains strongly connected components but is not itself strongly connected, into its corresponding matrix form:
And this matrix is a reducible matrix.
It can be seen that the two sub-matrices on the main diagonal represent two strongly connected components respectively, while the upper right sub-matrix represents the edge from the first strongly connected component to the second strongly connected component, and the lower left sub-matrix represents the edge from the second strongly connected component to the first strongly connected component (because there is no such edge, all are 0).
This form of writing block matrices is called Frobenius normal form.
So, it is natural to ask: Can we convert any non-negative matrix into Frobenius standard form?
By using directed graphs to represent non-negative matrices, we can easily see that the answer is yes, because any directed graph representing a non-negative matrix can be represented as strongly connected components that are connected to each other. The process is very simple:
That’s it!
Use the graph to get the Frobenius standard form
So, what is this better way?
Based on the above example, let's take a look at this process.
First, we merge each strongly connected component into a single object, as shown in the figure below. At this point, we can treat each strongly connected component as a black box - we don't care about its internal structure, we only look at its external connections.
Then, in this new graph, we need to find the components that have only outgoing edges and no incoming edges. In this specific example, there is only one, and we will label it number 0:
The next step is more complicated: number each component so that each component is numbered the farthest from 0. The following example can illustrate this more clearly:
It can be seen that there are two paths from number 0 to the middle component, so the path farthest from 0 is selected and numbered. Finally, we get:
In fact, this defines the order of the components. Next, mark the internal nodes of each component:
If the graph itself came from a matrix, then this relabeling process will result in a Frobenius canonical form matrix!
In fact, this relabeling process is to transform the original matrix using a permutation matrix P, which is composed of the product of multiple transposed matrices.
Here is the full form of the theorem:
Of course, the use of graphs to represent matrices is far more than that. For example, we can also use the eigenvalues of the matrix to define the eigenvalues of the graph. In fact, this idea gave rise to the research field of spectral graph theory.
Conclusion
Obviously, this equivalence between matrices and graphs is helpful for graph theory research and can also provide a new perspective for linear algebra calculation and analysis. It also has some important practical uses. For example, DNA data is often represented in the form of matrices or graphs.
In addition, we all know the importance of matrix operations to current large-model AI, and graphs represented by knowledge graphs are also becoming an important boost to current AI through technologies such as retrieval-enhanced search. Linking the two may bring some new breakthroughs in AI explainability and graph artificial intelligence. At the very least, this can help us learn linear algebra better.
In fact, the above content is extracted from the book "Mathematics of Machine Learning" that Tivadar Danka is writing. This book will introduce the mathematical knowledge related to machine learning from the basics to the depths, so that readers can truly understand the facts and the reasons behind them. Danka confidently claims that this will be "the best resource for learning machine learning." He has already released two chapter previews online. Interested readers can visit: https://tivadardanka.com/mathematics-of-machine-learning-preview/