Deep Learning using Nonequilibrium Thermodynamics

$P$: permutation matrix 代表 binary square matrix 每一行和每一列只有一個 1, 其餘為 0.
例如 $P= \left[\begin{array} {cc}1 & 0 & 0 & 0\\ 0 & 0 & 0 & 1 \\0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0\end{array}\right]$, $P$ 對應一個置換。可以說是置換群的一個 element.
注意上式代表 $A_1$, $A_2$ 是相似矩陣 (similar matrix), 有相同的 (1) rank; (2) eigenvalues (but not eigenvectors); (3) determinant; (4) trace. 不過反之不一定成立。因爲相同的 rank, eigenvalues, determinant, trace 只代表相似矩陣，不一定是permutation matrix.

圖譜和特徵值 Graph Spectrum and Eigenvalues

同構的圖有相同的特徵值，就像指紋。因此 eigenvalues 也稱為 graph spectrum (圖譜)。

如果兩個圖有相同的 spectrums 代表一定是同構的圖嗎？NO http://www.cs.yale.edu/homes/spielman/561/lect08-18.pdf 不過有辦法排除。

判斷 Connectivity Via Fourier Transform 適用無向圖 Laplacian

判斷連通 (connected graph, very low frequency (11111, -1-1-1), dc = 0)

無向圖 $G = (V, E)$ 的 Laplacian matrix $L$ 是對稱並且 diagonal dominant -> positive semi-definite matrix：

所有 eigenvalues (稱為 spectrum) 都是正實數。
$\lambda_{n−1} \ge …\ge \lambda_{1} \ge \lambda_{0}=0$
$x = [1,1,…,1]^T$ 對應 $Ax=\lambda_0 x$ where $\lambda_0=0$.
比較有意思的是 $\lambda_1$ 對應最少 cut 的 graph partition. $\lambda_1>0$ if and only if (iff) $G$ 是連通圖。
- 如果兩個 subgraphs 不連通，Laplacian $\lambda_1 = \lambda_0 =0$。兩個 subgraphs 對應的 eigen-vectors [111100] 和 [000011] 對應兩個 0 eigenvalues. 依次類推多個不連通 subgraphs.
因為 minimize $yLy^T$, where $y = [1, 1, .. -1, -1]^T$? (To be checked!)

判斷邊 (edge) :

用一些簡單 (高度對稱) 的 graphs 看 eigenvalues, 從 Spatical Fourier Transform 來想 graph spectrum!

這些對稱 graph 的 Laplacian matrix 都是 ciculant matrix 而且 DC=0, the eigenvalues 就是 frequency transform output, the eigenvectors 就是 sine, cosine waveform!!

Complete graph ($K_n$) 只有 $\lambda_0 = 0$, 其他 $\lambda_{n-1} = \lambda_{n-2} = … = \lambda_1 = n$
- $K_n$ eigenvalue $n$ 的 multiplicity = $n-1$ 似乎反應每一個頂點的 edge 有 $n-1$ 條。
- $K_4 : L = [3, -1 -1 -1;-1, 3 -1 -1;-1 -1, 3 -1;-1 -1 -1, 3]$, eigenvalues = (4,4,4,0)
- 0 對應的 eigenvector = [1, 1, 1, 1]
- (Multiplicity) 3 個 eigenvalue 4 對應的 eigenvectors = [-1 -1 -1 3], …
Star graph ($S_n$) 只有 $\lambda_0 = 0$, $\lambda_{n-2} = \lambda_{n-3} = … = \lambda_1 = 1$, $\lambda_{n-1}=n-2$
- S5 = [4 -1 -1 -1 -1; -1 1 0 0 0; 0 -1 1 0 0; 0 0 -1 1 0; 0 0 0 -1 1]
Ring graph ($R_n$) 只有 $\lambda_0 = 0$, $\lambda_{n-2} = \lambda_{n-3} = … = \lambda_1 = 1$, $\lambda_{n-1}=n-2$
- R5 = [2 -1 0 0 -1; -1 2 -1 0 0; 0 -1 2 -1 0; 0 0 -1 2 -1; -1 0 0 -1 2]
- Eigenvalue = 2 - 2 cos(2pi k / 5) -> (0, 1.38, 3.62, 3.62, 1.38)

判斷二部圖 (Bipartite Graph) : 適用無向圖 Adjacency Matrix

For d-regular graphs, d is the first eigenvalue of A for the vector v = (1, …, 1) (it is easy to check that it is an eigenvalue and it is the maximum because of the above bound). The multiplicity of this eigenvalue is the number of connected components of G, in particular �1>�2 $\lambda _{1}>\lambda _{2}$ for connected graphs. It can be shown that for each eigenvalue �� $\lambda _{i}$ , its opposite −��=��+1−� $-\lambda _{i}=\lambda _{n+1-i}$ is also an eigenvalue of A if G is a bipartite graph.[8] In particular −d is an eigenvalue of any d-regular bipartite graph.

Dynamic Graph (無向圖或有向圖)

除了判斷圖的 connection 之外，矩陣在動態 (time) sequence 非常有用。

Adjacency matrix: count trellis, information, coding.

Adjacency matrix 的變形 transition probability: Monte Carlo Markov chain (MCMC), Hidden Markov Chain (HMC).

Laplacian matrix: thermal diffusion

Walk

Trellis Count: 適用有向圖 Adjacency Matrix

這個在前文介紹過。隨著時間的增加，有更多的 distinct trellis. 見上圖 for Channel A and B.

$S_n = A S_{n-1}$ where $S_n = [a_n, b_n, …]^T$ 是 n-time stamp 的所有 states 的 trellis count.
$M_n = a_n + b_n + … = [1, 1..] S_n$

Channel A: $a_n = b_{n-1}$; $b_n = a_{n-1} + b_{n-1}$

\[\left[\begin{array} {cc} a_n\\ b_n\end{array}\right]= \left[\begin{array} {cc}0 & 1\\ 1 & 1\end{array}\right] \left[\begin{array} {cc} a_{n-1}\\ b_{n-1}\end{array}\right]\]

\[\begin{aligned} M_n &= [1\,1] \left[\begin{array} {cc} a_n\\ b_n\end{array}\right]= [1\,1] \left[\begin{array} {cc}0 & 1\\ 1 & 1\end{array}\right] \left[\begin{array} {cc} a_{n-1}\\ b_{n-1}\end{array}\right] = [1\,1] \left[\begin{array} {cc}0 & 1\\ 1 & 1\end{array}\right]^n \left[\begin{array} {cc} a_0\\ b_0\end{array}\right] \\ &= [1\,1] \left[\begin{array} {cc}0.53 & -0.85\\ 0.85 & 0.53 \end{array}\right] \left[\begin{array} {cc}1.62^n & 0\\ 0 & (-0.62)^n \end{array}\right] \left[\begin{array} {cc}0.53 & 0.85\\ -0.85 & 0.53 \end{array}\right]\left[\begin{array} {cc} 0\\ 1\end{array}\right] \\ &\simeq_{n\to\infty} [1\,1] \left[\begin{array} {cc}0.53 & -0.85\\ 0.85 & 0.53 \end{array}\right] \left[\begin{array} {cc}1.62^n & 0\\ 0 & 0 \end{array}\right] \left[\begin{array} {cc}0.53 & 0.85\\ -0.85 & 0.53 \end{array}\right]\left[\begin{array} {cc} 0\\ 1\end{array}\right] \\ &=1.62^n [1\,1] \left[\begin{array} {cc}0.53 & -0.85\\ 0.85 & 0.53 \end{array}\right] \left[\begin{array} {cc}1 & 0\\ 0 & 0 \end{array}\right] \left[\begin{array} {cc}0.53 & 0.85\\ -0.85 & 0.53 \end{array}\right]\left[\begin{array} {cc} 0\\ 1\end{array}\right] \\ &=1.62^n [1\,1] \left[\begin{array} {cc}0.53 & 0\\ 0.85 & 0 \end{array}\right] \left[\begin{array} {cc}0.53 & 0.85\\ -0.85 & 0.53 \end{array}\right]\left[\begin{array} {cc} 0\\ 1\end{array}\right] =1.62^n [1\,1] \left[\begin{array} {cc}0.53^2 & 0.53\cdot 0.85\\ 0.85\cdot 0.53 & 0.85^2 \end{array}\right] \left[\begin{array} {cc} 0\\ 1\end{array}\right] \\ &= 1.62^n (0.53\cdot 0.85+0.85^2) = 1.258 \cdot 1.62^n \end{aligned}\]

Channel B: $a_n = a_{n-1} + b_{n-1}$; $b_n = d_{n-1}$; $c_n = a_{n-1}$; $d_n = c_{n-1} + d_{n-1}$

\[\left[\begin{array} {cc} a_n\\ b_n \\ c_n \\ d_n \end{array}\right]= \left[\begin{array} {cc}1 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 \\1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1\end{array}\right] \left[\begin{array} {cc} a_{n-1}\\ b_{n-1} \\ c_{n-1} \\ d_{n-1} \end{array}\right]\]

\[\begin{aligned} M_n &= [1 1 1 1 ]\left[\begin{array} {cc} a_n\\ b_n \\ c_n \\ d_n \end{array}\right]= [1 1 1 1 ] \left[\begin{array} {cc}1 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 \\1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1\end{array}\right] \left[\begin{array} {cc} a_{n-1}\\ b_{n-1} \\ c_{n-1} \\ d_{n-1} \end{array}\right] = [1 1 1 1 ] \left[\begin{array} {cc}1 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 \\1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1\end{array}\right]^n \left[\begin{array} {cc} a_0\\ b_0 \\ c_0 \\ d_0 \end{array}\right]\\ &= [1 1 1 1 ] \left[\begin{array} {cc}1 & 1 & 0 & 0\\ 0 & 0 & 0 & 1 \\1 & 0 & 0 & 0 \\ 0 & 0 & 1 & 1\end{array}\right]^n \left[\begin{array} {cc} 0\\ 0 \\ 0 \\ 1 \end{array}\right] = [1 1 1 1 ] Q \left[\begin{array} {cc} (1.62)^n & 0 & 0 & 0\\ 0 & \lambda_2^n & 0 & 0 \\0 & 0 & \lambda_3^n & 0 \\ 0 & 0 & 0 & (-0.62)^n\end{array}\right] Q^{-1} \left[\begin{array} {cc} 0\\ 0 \\ 0 \\ 1 \end{array}\right] \\ &= 1.62^n \cdot \text{const} \end{aligned}\]

簡單來説：在 $n$ 很大時，Trellis count $\simeq \lambda_{max}^n \cdot \text{const}$

Markov Chain: 適用有向圖 Transition Probability Matrix

Diffusion Map (無向圖) Laplacian Matrix

Directed Graph Adjacency Matrix for Diffusion?

Laplacian Matrix for Diffusion?

Directed graph

Asymmetric, real or image eigenvalues
Distinguishable paths (messages)
- Related to Markov chain?

加上 probability

Directed graph + Transition probability = Markov chain

Random graph

Laplacian graph

Discrete Laplacian Operator on Graph

最先把拉普拉斯算子引入圖論 (graph theory) 非常有創意。兩者看起來沒什麼關係。拉普拉斯算子廣泛用於物理現象的計算，例如熱傳導，電磁場，量子力學等等。這樣的算子居然可以用於圖論的 graph partition, manifold learning, 實在是出乎意料。

不過有一個類比可以參考 (appendix)：電磁學 (based on Maxwell equation: Laplacian wave equation) 到電路學 (based on KCL and KVL: discrete graph).

參考以下的 reference for more: The Mathematical Foundations of Manifold Learning [@melas-kyriaziMathematicalFoundations2020]

拉普拉斯矩陣 [@wikiLaplacianMatrix2019]

最常見的物理公式和 discrete Laplacian 的類比。 1. 熱傳導 (heat diffusion) $\Delta \varphi(\vec{r},t) = -\frac{1}{c}\frac{\partial}{\partial t}\varphi(\vec{r},t)\quad c\text{ is conductivity}$

The Laplacian matrix can be interpreted as a matrix representation of a particular case of the discrete Laplace operator. Such an interpretation allows one, e.g., to generalise the Laplacian matrix to the case of graphs with an infinite number of vertices and edges, leading to a Laplacian matrix of an infinite size.

Suppose $\phi$ describes a heat distribution across a graph, where $\phi_i$ is the heat at vertex $i$. According to Newton’s law of cooling, the heat transferred between nodes $i$ and $j$ is proportional to $\phi_i - \phi_j$. if nodes $i$ and $j$ are connected (if they are not connected, no heat is transferred). Then, for heat capacity $k$, $\begin{aligned} \frac{d \phi_{i}}{d t} &=-k \sum_{j} A_{i j}\left(\phi_{i}-\phi_{j}\right) \\ &=-k\left(\phi_{i} \sum_{j} A_{i j}-\sum_{j} A_{i j} \phi_{j}\right) \\ &=-k\left(\phi_{i} \operatorname{deg}\left(v_{i}\right)-\sum_{j} A_{i j} \phi_{j}\right) \\ &=-k \sum_{j}\left(\delta_{i j} \operatorname{deg}\left(v_{i}\right)-A_{i j}\right) \phi_{j} \\ &=-k \sum_{j}\left(\ell_{i j}\right) \phi_{j} \end{aligned}$

另一個角度是用 KCL and KVL 類比。注意電路學仍然有 time varying part, quasi-static, 參考 appendix. Voltage = $\phi$, conductance=k, $I_{ij} = k(\phi_i-\phi_j)$

所以 Laplace matrix $L = D - A$, $D$ 是 diagonal degree matrix, $A$ 是 adjacent matrix.

基本解就是 Helmholtz equation: $\Delta \varphi + \lambda \varphi = 0$

Laplacian Matrix 特性

Assuming graph G(V, E) with n vertex

$\lambda_{n-1} \ge …\ge\lambda_2\ge\lambda_1\ge\lambda_0=0$
因為 x=(1,1,…,1)’ 對應 $Lx=\lambda_0 x$, and $\lambda_0=0$.
$\lambda_1$ 對應最少 cut 的 graph partition. 所以 $\lambda_1=0$ 代表 graph 有不連通的 subgraph.
因為 minimize yLy’, where y = (1, 1, .. -1, -1)?

Laplacian matrix with kernel

前面假設 equal weights/conductance, $k$, for all edges. 在一些應用比較好的方式是不同的 weights (conductance), between 0 and 1.

如果每個 vertex pair 都有距離，可以用 heat kernel $w_{ij} = \exp(-\frac{|x_i-x_j|^2}{c})$. W 定義好，D 就是 diagonal matrix, $d_{ii} = \sum{w_{ij}}$. and $L=D-W$. 同樣 $\lambda_0 = 0$, 對應的 eigenvector = (1,1,..1)’.

如果要計算所有的 vertex pair, 太多 pairs. 實務上有兩個方法 truncate edge: (a) $\epsilon$-neighborhoods: $|x_i-x_j|^2 < \epsilon \to w_{ij} = 0$; (b) machine learning n nearest neighborhood method (K=5, 6, 7, etc.).

Laplacian Matrix in Machine Learning

第一步是建立 adjacent graph，用 matrix 表示。 如果要計算所有的 vertex pair, 太多 pairs. 實務上有兩個方法 truncate edge: (a) $\epsilon$-neighborhoods: $|x_i-x_j|^2 < \epsilon$. 優點：本於幾何；對稱 matrix. 缺點：需要選擇 $\epsilon$, 太大得到 disconnected graph, 太小則太多 edges. (b) machine learning n nearest neighborhood method (K=5, 6, 7, etc.). 缺點：less geometrically intuitive.

第二步是選擇 weights, 建立 similarity matrix $W$, 同樣有兩個方式。 (a) heat kernel. 如果 (i, j) is connected, $W_{ij} = e^{-\frac{|x_i-x_j|^2}{t}} $ (b) 只用 0 and 1.

第三步 (eigenmaps): 定義 $D_{ii} = \sum_j W_{ji}$. $L = D-W$ 是拉普拉斯矩陣。為什麼如此定義？從上述的例子以及 KCL 可以看出就是 $\nabla\cdot \mathbf{J}=0$. $W_{ij}$ 就是 vertex i 和 j 的 conductance!! Laplace eigenmap (LE) 就是解下列方程式的 eigenvalues!

\[L\mathbf{f} = \lambda D\mathbf{f} \quad\equiv\quad (LD^{-1})(D\mathbf{f}) = \lambda(D\mathbf{f}) \quad\equiv\quad (D^{-1}L)\mathbf{f} = \lambda\mathbf{f} \\ \quad\equiv\quad (D^{-1/2}LD^{-1/2})(D^{1/2}\mathbf{f}) = \lambda(D^{1/2}\mathbf{f})\]

為什麼等式得右邊要加上 D? 其實是計算 $D^{-1/2}LD^{-1/2} = I - D^{-1/2}WD^{-1/2}$ 的 eigenvalues. (Eigenvector scales by $D^{1/2}$.)

Normalize $L$ matrix diagonal elements 為 1.
$D^{-1/2}LD^{-1/2}$ 是對稱 matrix, 保證所有 eigenvalues $\ge$ 0.
Minimize $\sum_{ij} (x_i - x_j)^2 W_{ij}$. 對應電路理論的最低功耗, $\sum_k V_k^2/R_k=\sum_k V_k^2 G_k$, 或是 spring network 的最低勢能如下。
重點是設定 boundary condition, 以及排除 trivial solution.

拉普拉斯矩陣，So What?

Optimal embedding: Let y be a same dimension embedding of x. Why? 因為 $W_{ij} \sim |x_i-x_j|^2$, 只是距離的函數。如果所有 $x_i$ 平移結果仍然不變。所謂 optimal embedding 就是 minimize energy/power representation. $\text{minimize}\quad \frac{1}{2}\sum_{ij} (y_i - y_j)^2 W_{ij} = \mathbf{y}^T L \mathbf{y}$ 變成 $\text{argmin}_\mathbf{y} \mathbf{y}^T L \mathbf{y} \qquad \text{subject to}\quad \mathbf{y}^T D \mathbf{y}=1 \quad\text{and}\quad \mathbf{y}^T D \mathbf{1}=0$ 為了避免 scaling factor, 加上 $\mathbf{y}^T D \mathbf{y}=1$ constraint (boundary condition); 另外為了避免選到 $\lambda_0=0$ 和對應的 eigenvector 1, 加上另一個必須和 1 垂直的 eigenvector 的限制 $\mathbf{y}^T D \mathbf{1}=0$. 如果用 $D=I$ identity matrix 更容易理解。
結果就是次小的 eigenvalue $\lambda_1$ (exclude $\lambda_0=0$).
這是 Courant-Fischer theorem 的推廣：[@spielmanLaplacianMatrices2011]
$\lambda_1 > 0$ if and only if G is connected.
以上是在最小 eigenvalue (exclude 0) 對應的 eigenvector 投影。這是 1-dimension case. 可以推廣 1-dimension to m-dimensions Euclidean embedding, 就是用更多的 eigenvectors. 參考 reference.
降到幾維就是選幾個 eigenvectors 做投影。
PCA and MDS? 是選大的 eigenvalues, why the difference?

Laplacian Eigenmaps (LE)

[@belkinLaplacianEigenmaps2003] relationship to spectra clustering eigenvalue and eigenvector of L

第一步是建立 adjacent graph，用 matrix 表示。 For each $x_i$, find the n nearest neighbors.

第二步是選擇 weights in high dimension use heat kernel, 建立 similarity matrix, $W$

第三步是定義 $D_{ii} = \sum_j W_{ji}$. $L = D-W$ 是拉普拉斯矩陣。 更重要的是 normalized 拉普拉斯矩陣 $D^{-1/2}LD^{-1/2} = I - D^{-1/2}WD^{-1/2}$ 的 eigenvalues. (Eigenvector scales by $D^{1/2}$.) Embedding 就是取 k lowest eigenvalues 對應的 eigenvectors of the matrix: $D^{-1/2}U$.

##Locally Linear Embedding (LLE) LLE: eigenvalue and eigenvector of L^2 LE and LLE are the same thing! LLE 主要的目的是降維。Give a data set $x_1, x_2, … x_k$ in a high-dimension $\mathbb{R}^l$. The goal is to find a low-dimensional representation $y_1, y_2, …, y_k \in \mathbb{R}^m, m \ll k$.

第一步是建立 adjacent graph，用 matrix 表示。 這一步和 LE 一樣。For each $x_i$, find the n nearest neighbors.

第二步是選擇 weights in high dimension, 這和 LE 用 heat kernel 不同 找出 $W_{ij}$ such that $\sum_j W_{ij}x_{i_j}$ equals the orthogonal projection of $x_i$ onto the affine linear space of $x_{i_j}$. 也就是 minimize: $\sum_{i=1}^{l}\left\|\mathbf{x}_{i}-\sum_{j=1}^{n} W_{i j} \mathbf{x}_{i_{j}}\right\|^{2}\qquad\text{subject to}\qquad \sum_j W_{ij} =1$

第三步是計算 embedding Embedding 就是取 k lowest eigenvalues 對應的 eigenvectors of the matrix: $E = (I-W)^T (I-W)$, where E is symmetric PSD matrix. $Ef \approx \frac{1}{2} L^2 f$ $L^2$ 的 eigenvector 和 $L$ 的 eigenvector 一樣。只是 eigenvalues 差平方倍。

Diffusion Maps (DM) Algorithm

Keyword: Markov chain, path integral from quantum mechanics. 前面的ISOMAP、LLE、LE等方法的基本思路都是通过用数据点在欧式空间的局部分布来逼近原流形空间的局部分布，以此推测其全局结构，但是本节将要进行介绍的Diffusion Map则很不一样，它从概率或者说随机漫步(Random Walk)的角度出发，用各个点之间的连通性来描述远近，例如假设点A与点B之间有很多条高概率的路径，之前几种方法几乎都是选择概率最大（距离最近）的一条来定义距离，而Diffusion Map则会把所有这些路径的概率值全部加起来，整个求解过程就像某个对象在整个图模型中进行随机游走，计算最终平稳后落在每个点的概率。 雖然 diffusion map 基本原理或詮釋來自隨機漫步。所有的計算都是 deterministics.

具体实现起来: 第一步是建立 adjacent graph，$\epsilon$-neighborhoods: $|x_i-x_j|^2 < \epsilon$. or (b) k nearest neighborhood graph.

第二步是選擇 weights, 建立 similarity matrix $W$.

第三步定義 $P = D^{-1}W$ transition probability matrix. P矩阵可以认为包含了目标对象从任意一点i经过一步转移到j的概率值，接下来如果我们继续计算P的k次幂，则可以得到从任意一点i经过k步跳转到j的概率值，可以想象，通过不断计算 $P^t$ 我们可以知道整个数据在不同时刻的概率分布变化过程，一般把这个过程称为Diffusion Process。

Diffusion distance and maps

有了上面的Diffusion Process，我们可以定义一个更一般性的diffusion 距离矩阵 $D_{t}\left(X_{i}, X_{j}\right)^{2}=\sum_{u \in X}\left\|p_{t}\left(X_{i}, u\right)-p_{t}\left(u, X_{j}\right)\right\|^{2}=\sum_{k}\left\|P_{i k}^{t}-P_{k j}^{t}\right\|^{2}$

之后就用类似MDS的方法，在新空间中计算一个最能保持这个距离的坐标即可。

另外可以用 transition matrix (with parameter t) 的 eigenvalues $\lambda_i$ and eigenvectors $\psi_i$ 定義 diffusion map: $\Psi_{t}(\boldsymbol{x})=\left(\lambda_{1}^{t} \psi_{1}(\boldsymbol{x}), \lambda_{2}^{t} \psi_{2}(\boldsymbol{x}), \cdots, \lambda_{k}^{t} \boldsymbol{\psi}_{k}(\boldsymbol{x})\right)$

Diffusion distance 和 diffusion map 的關係如下： $D_{t}\left(\boldsymbol{x}_{0}, \boldsymbol{x}_{1}\right)^2=\sum_{j \geq 1} \lambda_{j}^{2 t}\left(\psi_{j}\left(x_{0}\right)-\psi_{j}\left(x_{1}\right)\right)^{2}=\left\|\Psi_{t}\left(\boldsymbol{x}_{0}\right)-\Psi_{t}\left(\boldsymbol{x}_{1}\right)\right\|^{2}$

Diffusion distance and map 都和 $t$ 相關。t 是一個可以調整的參數。
Diffusion distance 可以視為 “soft weighted distance”, 類似量子力學的所有路徑積分。最大的好處是 robust to noise perturbation。這正好是 manifold learning algorithm 最大的弱點。下圖 $D(B,C)<D(A,B)$.
两个随机游走的機率相减是在比较i、j跳转到其它点u的概率分布差异性，例如i、j都有0.3的概率跳转到A，0.1的概率跳转到B，0.6的概率跳转到C，那么i和j的距离就应该是0（相似度高），相反，如果i到A、B、C的概率分别为0.3、0.1、0.6，而j到A、B、C的距离分别为0.3、0.6、0.1，那么i和j的距离应该很远（相似度低），从某种角度上看，Diffusion Map可以算是一种soft版本的LE，同样是局部点的分布影响较大，远的点影响较小，不同在于LE的相似度矩阵 W 是用 ϵ-neighbours graph或者k-nearest neighbours graph 产生的边，而Diffusion Map则通过t来调节扩散的scale。
Diffusion distance = Euclidean distance in diffusion map space (with all n-1 eigenvectors)!
“Diffusion maps” - “eigenvalues” = “Laplacian eigenmaps”. 但是 diffusion maps with parameter t.

The Euclidean distance and the geodesic distance (shortest path) between points A and B are roughly the same as those between points B and C. However, the diffusion distance between A and B is much larger since it has to go through the bottleneck. Thus, the diffusion distance between two individual points incorporates information of the entire set, implying that A and B reside in different clusters, whereas B and C are in the same cluster. This example is closely related to spectral clustering.

The basic algorithm framework of diffusion map is as:

Step 1. Given the similarity matrix L

Step 2. Normalize the matrix according to parameter $\alpha$ :

\[{L^{\alpha}=D^{-\alpha} L D^{-\alpha}} L^{\alpha}=D^{-\alpha} L D^{-\alpha}\]

Step 3. Form the normalized matrix

\[{M=({D}^{\alpha})^{-1}L^{\alpha}}M=({D}^{\alpha})^1L^{\alpha}\]

Step 4. Compute the k largest eigenvalues of $M^{t}$ and the corresponding eigenvectors

Step 5. Use diffusion map to get the embedding $\Psi _{t}$. 另一種方式是計算 diffusion distance $D_t$, 再用類似 MDS 計算 embedding.

比較 PCA, MDS, ISOMAP, LE, LLE, MD (unsupervised learning 降維)

上文已經說到 LE, LLE 基本是同一類算法，基於 graph laplacian 的 最小eigenvalues (0 除外) and eigenvectors. LE and LLE 的想法是降維儘量保持 local relationship characteristics, 但不是 isometry. 白話文就是降維時局部鄰居儘量住在附近，但整體的佈局會很不一樣。Diffusion maps 可視為 “soft weighted” Laplacian eigenmaps, 使用 $t$ 參數控制 diffusion maps and distance, 包含所有可能路徑。robust to noise perturbation. $t$ 可以平衡 local and global information. 另外 weighted by eigenvalues. 結論：LE and LLE (not DM) 重視 local more than global.

PCA 是降維算法的祖師爺和 baseline. PCA 的想法就完全不同！降維儘量保持global information/entropy/variance, 也就是保持最大熵。實際作法先計算 covariance matrix, 再取最大 eigenvalues and eigenvectors. MDS 基本和 PCA 是等價，只是用 distance matrix 替代 covariance matrix (similarity matrix). IOSMAP 是 manifold 的 MDS, 用 geodesic 代替 euclidean distance. 結論：PCA, MDS, ISOMAP 重視 global information/entropy/variance.

另一種分類：PCA and MDS 適用於歐氏空間。ISOMAP/LE/LLE/DM 都是用 neighborhood graph 適用於 manifold space.

下圖是一個 toy example. [@belkinLaplacianEigenmaps2003] 原始資料如下圖左：1000 images of size 40x40, image 是紅色水平長方形或是藍色垂直長方形 at random location (顏色只是視覺效果，並非 labels). 也就是在 1600 dimension 有 1000 random data points (對應 500 水平長方形和 500 垂直長方形）。下圖中是 Laplacian Eigenmaps 降維從 1600-dimension to 2-dimension. 下圖右是傳統 PCA 降維從 1600-dimension to 2-dimension. 明顯在這類 data set (different local structure of clustering) LE 可以分類，但是 PCA 無法分類。

Appendix

電路學的基石：KCL and KVL

電路學的基礎 KCL and KVL (Kirchhoff current/voltage laws) 是 Maxwell equations 在近穩態 (quasi-static) 的簡化解。

首先是 Maxwell equation 和 continuity equation. [@larssonElectromagneticsQuasistatic2007] $\begin{array}{c}{\nabla \cdot \mathbf{E}=\frac{\rho}{\varepsilon_{0}}} \\ {\nabla \cdot \mathbf{B}=0} \\ {\nabla \times \mathbf{E}=-\frac{\partial \mathbf{B}}{\partial t}} \\ {\nabla \times \mathbf{B}=\mu_{0}\left(\mathbf{J}+\varepsilon_{0} \frac{\partial \mathbf{E}}{\partial t}\right)} \\ {\frac{\partial\rho}{\partial t} + \nabla\cdot\mathbf{J}=0} \end{array}$

對應的拉普拉斯算子有兩種：Lorentz gauge 和 Coulomb gauge. [@wikiGaugeFixing2019] Lorentz gauge, $\nabla\cdot\mathbf{A} + \frac{1}{c}\frac{\partial V}{\partial t}=0$ $\begin{aligned} \Delta V-\frac{1}{c^{2}} \frac{\partial^{2}}{\partial t^{2}} V &=-\frac{\rho}{\epsilon_{0}} \\ \Delta \mathbf{A}-\frac{1}{c^{2}} \frac{\partial^{2}}{\partial t^{2}} \mathbf{A} &=-{\mu_{0}}{\mathbf{J}} \end{aligned}$

更簡潔用張量表示 [@wikiMaxwellEquations2019]

定義 EM four-potential (1,0) 張量 $A^{\mu} = \left[ \frac{V}{c}, \mathbf{A}\right]$.
以及 four-current (1,0) 張量 $J^{\mu} = \left[ c\rho, \mathbf{J}\right]$. EM field 變成 (2,0) EM-張量：$F^{\mu\nu} = \partial^{\mu} A^{\nu} - \partial^{\nu} A^{\mu}$. tensor form: $\mathbf{\hat{\hat{F}}} = \nabla\times\mathbf{\hat{A}}$?

Lorentz gauge in Minkowski space：$\partial_{\mu} A^{\mu} = 0$; in curved space: $\nabla_{\mu} A^{\mu} = 0$; tensor form: $\nabla\cdot\mathbf{\hat{A}}=0$

Wave equation (Laplacian operation in 4-D): $\square A^{\alpha} = \partial_{\beta}\partial^{\beta}A^{\alpha}=\mu_o J^{\alpha}$; in curved space: $\nabla_{\beta}\nabla^{\beta}A^{\alpha} - R_{\beta}^{\alpha}A^{\beta}=\mu_o J^{\alpha}$. tensor form: $\Delta\mathbf{\hat{A}} - … = \mu_o \mathbf{\hat{J}}$?

Maxwell field equation using $\mathbf{\hat{\hat{F}}}$, 和 gauge 無關:$\nabla_{\alpha}F^{\alpha\beta} = \mu_o J^{\beta}$ and $ F_{\alpha \beta} =2 \nabla_{[\alpha} A_{\beta]}$

tensor form: $\nabla\cdot\mathbf{\hat{\hat{F}}}=\mu_o \mathbf{\hat{J}}$, and $\nabla\times\mathbf{\hat{\hat{F}}}=0$?

最簡潔用群表示：TBA

Coulomb gauge, $\nabla\cdot\mathbf{A}=0$ (參考座標系使所有電荷運動為 0??? 無法用 covariant tensor!): $\begin{aligned}{\Delta V=-\frac{\rho}{\varepsilon_{0}}} \qquad\qquad\qquad \\ {\Delta \mathbf{A}-\frac{1}{c^{2}} \frac{\partial^{2}}{\partial t^{2}} \mathbf{A}=-\mu_{0} \mathbf{J}+\frac{1}{c^2} \nabla \frac{\partial V}{\partial t}}\end{aligned}$

近穩態就是電路的工作頻率對應的波長大於幾何尺寸，$\frac{c}{f} > L$. 可以忽略時間的微分。只列出電場相關的公式。 $\begin{array}{c}{\nabla \cdot \mathbf{E}=\frac{\rho}{\varepsilon_{0}}} \\ {\nabla \times \mathbf{E}= 0} \\ {\nabla\cdot\mathbf{J}=0} \end{array}$

從 $\nabla \times \mathbf{E}= 0$, 可以得出 $\oint \mathbf{E}\cdot d\mathbf{L} = 0$ 以及 $\mathbf{E}= -\nabla V$ and $V_{ab} = -\int a^b\mathbf{E}\cdot d\mathbf{L}$. 就是 KVL. 從 $\nabla\cdot\mathbf{J}=0$，可以得出 $\iint \mathbf{J}\cdot d\mathbf{S} = I{enc} = 0$. 就是 KCL. 把 $\mathbf{E}= -\nabla V$ 代入 $\nabla \cdot \mathbf{E}\,$, 得到勢能在近穩態的拉普拉斯算子 $\Delta V = -\frac{\rho}{\varepsilon_{0}}$. 一般用在空間正負電荷分離分佈的勢能場計算，例如 dipole, 電容，傳輸線。 電路學假設電中性，整體的基礎是 base on KVL and KCL.

KCL: Nodal analysis + conductance matrix [@wikiNodalAnalysis2018]

一個電路有 N nodes, for node k, KCL: $\sum_{j\ne k} G_{jk}(v_k - v_j) = 0 \to G_{kk}v_k - \sum_{j\ne k} G_{jk}v_j = 0$ $G_{kk}$ 是接到 node k conductance 的總和。如果有額外的 current source $i_k$ 連接到 node k, $i_k = G_{kk}v_k - \sum_{j\ne k} G_{jk}v_j$. 用矩陣表示：$\mathbf{GV = I}$ $\left(\begin{array}{cccc}{G_{11}} & {G_{12}} & {\cdots} & {G_{1 N}} \\ {G_{21}} & {G_{22}} & {\cdots} & {G_{2 N}} \\ {\vdots} & {\vdots} & {\ddots} & {\vdots} \\ {G_{N 1}} & {G_{N 2}} & {\cdots} & {G_{N N}}\end{array}\right)\left(\begin{array}{c}{v_{1}} \\ {v_{2}} \\ {\vdots} \\ {v_{N}}\end{array}\right)=\left(\begin{array}{c}{i_{1}} \\ {i_{2}} \\ {\vdots} \\ {i_{N}}\end{array}\right)$.

因為 KCL 要求所有所有 current source i_k 總和為 0， $\sum_k i_k = 0$. 所以 $G$ 是 singular matrix, i.e. eigenvalue $\lambda_0 = 0$. 一般會把 $v_N = 0$, 把 $N\times N$ sigular matrix 變成 $(N-1)\times (N-1)$ non-sigular matrix.

用一個具體的例子如下。此處已經把 $v_4=0$ 代入，所以得到 3x3 non-singular matrix. 如果要類比 4x4 singular matrix, 就要把 $v_4$ 加入，會得到 4x4 sigular matrix, 每一個 row 的總和為 0! 正如同 Laplacian graph 一樣！ $G_{4x4} : \left[\begin{array}{ccc}{G_1+G_2+G_3} & {-G_2} & {-G_3} & {-G_1}\\ {-G_2} & {G_2+G_4+G_5} & {-G_5} & {-G_4}\\ {-G_3} & {-G_5} & {G_3+G_5} & {0} \\ {-G_1} & {-G_4} & {0} & {G_1+G_4} \end{array}\right]\left[\begin{array}{c}{v_{1}} \\ {v_{2}} \\ {v_{3}} \\ {v_{4}}\end{array}\right]=\left[\begin{array}{c}{i_{A}} \\ {0} \\ {-i_{B}} \\ {-i_A + i_B}\end{array}\right]$

如果把獨立電流源 $i_k$ 改成電容 (with initial condition). $i_k = c_k \frac{d v_k}{dt}$, 等式右邊就變成 V 對時間一次微分：heat equation. 如果再加上 inductor, 等式右邊就多了時間一次積分：damped oscillation equation.

KVL: Loop analysis + impedance matrix

Julia LE and LLE examples

先說結論。 LE 是計算 $D^{-1/2}LD^{-1/2} = U \Lambda U^T$ 的 eigenvalues. Embedding: $D^{-\frac{1}{2}} U$. LLE 是計算 $1/2 L^2 = \hat{U} \hat{\Lambda} \hat{U}^T$ 的 eigenvalues. Embedding: $\hat{U}$. Embedding $D^{-\frac{1}{2}} U$ or $\hat{U}$ 的 columns 是 eigenvectors. 注意 embedding 一定有一個 $\mathbf{1}$ eigenvector, 對應 eigenvalue=0. 降維就是高維的 vector 在 eigenvectors 的投影！除了 $\mathbf{1}$ 以外。如果是降到 1 維，就選次小的 eigenvalue 的 eigenvector.

using LinearAlgebra
W = [0. 0.3 0.7; 0.3 0. 1.7; 0.7 1.7 0.]
D = [1. 0. 0.; 0. 2. 0.; 0. 0. 2.4]
L = D - W
# D^(1/2) and D^(-1/2)
D1_2 = sqrt.(D)
D_1_2 = inv(D1_2)

## Laplace Eigenmap
D_1_2LD_1_2 = D_1_2*L*D_1_2
l_le = eigvals(D_1_2LD_1_2)
U_le = eigvecs(D_1_2LD_1_2)
embed_le = D_1_2 * U_le

## Locally Linear Embedding
LL = 0.5*L*L
l_lle = eigvals(LL)
U_lle = eigvecs(LL)
embed_lle = U_lle

## LE solution here
julia> W
3×3 Array{Float64,2}:
 0.0  0.3  0.7
 0.3  0.0  1.7
 0.7  1.7  0.0

julia> D 
3×3 Array{Float64,2}:
 1.0  0.0  0.0
 0.0  2.0  0.0
 0.0  0.0  2.4

julia> L
3×3 Array{Float64,2}:
  1.0  -0.3  -0.7
 -0.3   2.0  -1.7
 -0.7  -1.7   2.4

julia> D1_2
3×3 Array{Float64,2}:
 1.0  0.0      0.0    
 0.0  1.41421  0.0    
 0.0  0.0      1.54919

julia> D_1_2
3×3 Array{Float64,2}:
 1.0  0.0       0.0     
 0.0  0.707107  0.0     
 0.0  0.0       0.645497

julia> D_1_2LD_1_2 = D_1_2*L*D_1_2
3×3 Array{Float64,2}:
  1.0       -0.212132  -0.451848
 -0.212132   1.0       -0.77594 
 -0.451848  -0.77594    1.0     

julia> l_le = eigvals(D_1_2LD_1_2)
3-element Array{Float64,1}:
 0.0               
 1.1818019484660534
 1.818198051533946 

julia> U_le = eigvecs(D_1_2LD_1_2)
3×3 Array{Float64,2}:
 -0.430331   0.869825  -0.241286
 -0.608581  -0.476988  -0.634123
 -0.666667  -0.126041   0.734622

julia> embed_le = D_1_2 * U_le
3×3 Array{Float64,2}:
 -0.430331   0.869825   -0.241286
 -0.430331  -0.337282   -0.448393
 -0.430331  -0.0813592   0.474196

julia> LL = 0.5*L*L
3×3 Array{Float64,2}:
  0.79    0.145  -0.935
  0.145   3.49   -3.635
 -0.935  -3.635   4.57 

julia> l_lle = eigvals(LL)
3-element Array{Float64,1}:
 0.0
 1.0527010808648647    
 7.797298919135132     

julia> U_lle = eigvecs(LL)
3×3 Array{Float64,2}:
 0.57735   0.808449  -0.114355
 0.57735  -0.503259  -0.64296 
 0.57735  -0.30519    0.757315

julia> embed_lle = U_lle
3×3 Array{Float64,2}:
 0.57735   0.808449  -0.114355
 0.57735  -0.503259  -0.64296 
 0.57735  -0.30519    0.757315

二次式和正定矩陣 Quadratic Form and Positive Definite Matrix

Posted on 2023-01-28 | In Language

Main Reference

二次型與正定矩陣: 線代啟示錄 (wordpress.com)http://www-personal.umich.edu/~mmustata/Slides_Lecture13_565.pdf)

二次式

二次式大概是數學和物理最重要的形式。不論是求解、極值（最大或最小）、非綫性、簡諧運動等等都和二次式相關。矩陣也不例外。

有了二次式就可以定義正定矩陣，Rayleigh Quotient, Courant-Fischer Theorem, etc. 二次式和 eigenvalues 直接相關。

令 $A=\left[a_{i j}\right]$ 為一個 $n \times n$ 實矩陣, $\mathbf{x}=\left[\begin{array}{c}x_1 \ \vdots \ x_n\end{array}\right]$ 為 $n$ 維實向量, 具有以下形式的實函數稱為二次型 (quadratic form) : $f(\mathbf{x})=\mathbf{x}^T A \mathbf{x} 。$

注意：二次型 $\mathbf{x}^T A \mathbf{x}$ 是一個純量。
任意二次型 $\mathbf{x}^T A \mathbf{x}$ 都可以轉換為等價的 $\mathbf{x}^T B \mathbf{x}$, 其中 $B$ 是一個實對稱矩陣：$B=\frac{1}{2}\left(A+A^T\right)$
- 利用一點運算技巧改寫矩陣乘法公式可得

$\begin{aligned} \mathbf{x}^T A \mathbf{x} & =\sum_{i=1}^n \sum_{j=1}^n a_{i j} x_i x_j \\ & =\sum_{i=1}^n \sum_{j=1}^n \frac{1}{2}\left(a_{i j}+a_{j i}\right) x_i x_j \\ & =\mathbf{x}^T\left[\frac{1}{2}\left(A+A^T\right)\right] \mathbf{x} \end{aligned}$

矩陣 $A$ 與 $B=\frac{1}{2}\left(A+A^T\right)$ 有相等的二次型, 不難驗證 $\frac{1}{2}\left(A+A^T\right)$ 是對稱的。例如,

\[\left[\begin{array}{ll} x & y \end{array}\right]\left[\begin{array}{ll} 5 & 4 \\ 2 & 7 \end{array}\right]\left[\begin{array}{l} x \\ y \end{array}\right]=5 x^2+6 x y+7 y^2=\left[\begin{array}{ll} x & y \end{array}\right]\left[\begin{array}{ll} 5 & 3 \\ 3 & 7 \end{array}\right]\left[\begin{array}{l} x \\ y \end{array}\right] 。\]

正定矩陣

定義

令 $A$為一個 $n\times n$ 階實對稱矩陣。若每一 $n$ 維非零實向量 $\mathbf{x}$ 皆使得 $\mathbf{x}^TA\mathbf{x}>0$，我們稱 $A$ 為正定 (positive definite)；若將上述條件放鬆為 $\mathbf{x}^TA\mathbf{x}\ge 0$，則 $A$ 稱為半正定 (positive semidefinite)。如果 $\mathbf{x}^TA\mathbf{x}$可能是正值也可能是負值，則稱 $A$ 是未定的 (indefinite)。

傳統上，我們習慣將對稱性納入正定矩陣的定義，一方面因為實對稱正定矩陣擁有美好的性質，另一個原因是實對稱正定矩陣的分析就足以應付其他一般的正定矩陣。

任意二次型 $\mathbf{x}^T A \mathbf{x}$ 都可以轉換為等價的 $\mathbf{x}^T B \mathbf{x}$, 其中 $B$ 是一個實對稱矩陣：$B=\frac{1}{2}\left(A+A^T\right)$
如果 $A$ 是正定或半正定矩陣，則實對稱矩陣 $B$ 也是正定或半正定矩陣。
如何判斷 $A$ 是正定或半正定矩陣？顯然不可能試所有的 $\mathbf{x}^TA\mathbf{x} > 0$.
- 最直接的方法就是看 eigenvalues. 如果所有 eigenvalues 都大於 0, 為正定矩陣。如果所有 eigenvalues 都大於等於 0, 為半正定矩陣。
- 注意：如果 $A$ 不是對稱矩陣，eigenvalues 有可能是複數。此時判斷 $B = \frac{1}{2}(A+A^T)$ 的 eigenvalues. 因爲 $B$ 是對稱矩陣，所有 eigenvalues 一定都是實數。
- 證明：假設 $A$ 是對稱矩陣，$A = Q D Q^T \to \mathbf{x}^T A \mathbf{x} = \mathbf{x}^T Q D Q^T \mathbf{x} = \mathbf{z}^T D \mathbf{z} = \lambda_1 z_1^2 + … + \lambda_n y_n^2$
  - 如果 $\lambda_k > 0, \, \mathbf{x}^T A \mathbf{x} > 0 \to A $ 是正定矩陣
  - 如果 $\lambda_k \ge 0, \, \mathbf{x}^T A \mathbf{x} \ge 0 \to A $ 是半正定矩陣

幾何意義

考慮 $n=1$ 的情況，矩陣 $A$ 和向量 $\mathbf{x}$ 分別退化為純量 $a$ 和 $x$，如果對任意非零 $x$ 都有 $xax=ax^2>0$。

我們說 $a$ 是正定的，或簡潔地說 $a$ 是正的 ($a>0$)，則 $ax$ 與 $x$ 有相同的正負號。當 $n>1$ 時，令 $\theta$ 為 $A\mathbf{x}$ 與 $\mathbf{x}$ 的夾角，此夾角的餘弦為

\[\cos\theta=\displaystyle\frac{\mathbf{x}^T(A\mathbf{x})}{\Vert\mathbf{x}\Vert~\Vert A\mathbf{x}\Vert}\]

上式中，$A\mathbf{x}$ 與 $\mathbf{x}$ 的內積為正值表示經線性變換後的向量 $A\mathbf{x}$ 與原向量 $\mathbf{x}$ 的夾角小於 $90^{\circ}$。見下圖，$\mathbf{x}$ 為超平面 $P$ 的法向量，正定矩陣 $A$ 保證變換後的向量 $A\mathbf{x}$ 與原向量 $\mathbf{x}$ 都位於超平面 $P$ 的同一側。

實數對稱矩陣 or Complex Hermitian Matrix 定理

特徵值的重要定理：Courant-Fischer min-max theorem

本定理是針對複數 Hermitian matrix 或是實數矩陣, $A = A^*$ (complex matrix) or $A = A^T$ (real matrix)

實數對稱矩陣 (或複數 Hermitian matrix) 有獨特的地位:

Eigenvalues 均爲實數，Eigenvectors 互相正交。證明很容易 $A x = A^* x = \lambda x \to x^* A = \lambda^* x^* \to x^* A x = \lambda^* x^* x \to x^* \lambda x = \lambda^* x x^* \to \lambda = \lambda^*$
EVD 和 SVD 基本一致，不過 eigenvalues and eigenvectors 可能差正負號。代表 invariant eigenvectors 也是正交 basis.
$x^* A x$ ：對於任意 column vector 向量 $x \in \C$, $x^* A x$ 為實數。(這對複數 vector 很重要，對於實數 vector trivial).
- 證明：$x^* A x = x^* Q D Q^* x = z^* D z = \sum_i \lambda_i |z_i|^2 \in \R$ 因爲 $\lambda_i \in \R$
以 2D SVD 爲例，正交 basis 就是單位圓 vector $|x|=1$ 經過 $Ax$ mapping 後產生最大和最小的 vector. 因爲是對稱矩陣，SVD 和 EVD 一致，所以 $\lambda_{max}$ 對應最大的 $|Ax|$, 所以 $\lambda_{min}$ 對應最小的 $|Ax|$
- $\lambda_{max} = \max \frac{x^* A x}{x^* x}$
- $\lambda_{min} = \min \frac{x^* A x}{x^* x}$
那麽 3D, 4D 或更高維的 eigenvalues 又是如何呢？這就是 Courant-Fischer min-max theorem

Rayleigh Quotient

定義 Rayleigh Quotient: $R = \frac{x^* A x}{x^* x}$ where $x \ne 0$

Given $A$ 是 $n \times n$ 的 Hermitian matrix (or real symmetric matrix), eigenvalues 皆爲實數。假設 $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n$, 對應的正交 eigenvectors 為 $u_1, u_2, …, u_n$則有

$\max R = \max \frac{x^* A x}{x^* x} = \lambda_1$ when $x = k u_1$
$\min R = \min \frac{x^* A x}{x^* x} = \lambda_n$ when $x = k u_n$

證明：$R = \frac{x^* A x}{x^* x} = \frac{x^* Q D Q^* x}{x^* x} = \frac{z^* D z}{z^* z} = \frac{\sum_i \lambda_i

z_i

^2 }{\sum_i

z_i

}$ 因此 $ \lambda_n \le R \le \lambda_1$

$z = Q^* x$ 並且 $x^* x = z^* z$

直觀看出如果是和 $u_1$ 正交的空間 $(0, u_2, …, u_n )$ 的 Rayleigh Quotient 最大值是 $\lambda_2$, i.e.

$\max_{x \perp u_1} \frac{x^* A x}{x^* x} = \max_{x^* u_1 = 0} \frac{x^* A x}{x^* x} =\max_{x \in (u_2, .. u_n)} \frac{x^* A x}{x^* x} = \lambda_2$
- 同樣的邏輯，$\max_{x \in (u_k, .. u_n)} \frac{x^* A x}{x^* x} = \lambda_k$
- 比較有趣是，最後 eigenvalue $\max_{x \in (u_n)} \frac{x^* A x}{x^* x} = \lambda_n = \min_{x} \frac{x^* A x}{x^* x}$

上式必須知道 $u_1$ 是否有方法繞過 $u_1$? Yes, 利用 min

考慮任意向量 $w \in \C$ 且 $x \perp w \to z \perp Q^* w$, 令 $V = {z

z\perp Q^* w}$, 且 $V’ = {z

z\perp Q^* w, z_3 = z_4 … = z_n =0}$, 則有 $V’ \subset V$

$\max_{x \perp w} \frac{x^* A x}{x^* x} = \max_{z \perp Q^* w} \frac{z^* D z}{z^* z} = \max_{z \in V} \frac{z^* D z}{z^* z}$ $ \ge \max_{z \in V’} \frac{z^* D z}{z^* z} = \max_{z \in V’} \frac{\lambda_1 |z_1|^2 + \lambda_2 |z_2|^2}{|z_1|^2 + |z_2|^2} \ge \lambda_2$
等號成立當 $w = u_1$

因此 $\min_w \max_{x \perp w} \frac{x^* A x}{x^* x} = \lambda_2$
- 這公式有點難看懂。就是任意的 (n-1) 維子空間，所有最大值中的最小值是 $\lambda_2$. 仔細想一下，如果 $w \nparallel u_1$, (n-1) 維子空間一定包含 $u_1$, R 的最大值是 $\lambda_1$；只有當 $w \parallel u_1$, (n-1) 維子空間不包含 $u_1$, R 的最大值是 $\lambda_2 \le \lambda_1$, 才是正解。因此可以用 $\min_w$ 取代 $w = u_1$.
反過來 $\max_w \min_{x \perp w} \frac{x^* A x}{x^* x} = \lambda_{n-1}$
- 就是任意的 (n-1) 維子空間，所有最小值中的最大值是 $\lambda_{n-1}$. 仔細想一下，如果 $w \nparallel u_n$, (n-1) 維子空間一定包含 $u_n$, R 的最小值是 $\lambda_n$；只有當 $w \parallel u_n$, (n-1) 維子空間不包含 $u_n$, R 的最小值是 $\lambda_{n-1} \ge \lambda_n$. 因此可以用 $\max_w$ 取代 $w = u_n$.
同樣 $\min_{w_1, w_2} \max_{x \perp w_1, w_2} \frac{x^* A x}{x^* x} = \lambda_3$
- 就是任意的 (n-2) 維子空間，所有最大值中的最小值是 $\lambda_3$. 如果 (n-2) 維子空間包含 $u_1$ 或 $u_2$, R 的最大值是 $\lambda_1$ 或 $\lambda_2$；只有當 (n-2) 維子空間不包含 $u_1, u_2$, R 的最大值是 $\lambda_3 \le \lambda_1, \lambda_2$, 才是正解。因此可以用 $\min_{w_1, w_2}$ 取代 $w_1 = u_1, w_2 = u_2$.
反過來 $\max_{w_1, w_2} \min_{x \perp w_1,w_2} \frac{x^* A x}{x^* x} = \lambda_{n-2}$

Courant-Fischer Min-Max Theorem Summary

對於 $n \times n$ 的 Hermitian matrix (或是實數的對稱矩陣)：

$\lambda_k$ 是 $k$-th 大的 eigenvalue, i.e. $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n$

$\lambda_k = \min_{dim(V)=n-k+1} \max_{x \in V} R = \min_{dim(V)=n-k+1} \max_{x \in V} \frac{x^* A x}{x^* x}$
$\lambda_k = \max_{dim(V)=k} \min_{x \in V} R = \max_{dim(V)=k} \min_{x \in V} \frac{x^* A x}{x^* x}$

應用 1: Constrained Optimization

最大化 $\mathbf{x}^T A \mathbf{x}, \mathbf{x}$ 滿足 $|\mathbf{x}|^2=\mathbf{x}^T \mathbf{x}=1$ 。

求解這個約束最佳化 (constrained optimization) 問題的傳統方法是引入 Lagrangian multiplier： $L(\mathbf{x}, \lambda) \equiv \mathbf{x}^T A \mathbf{x}-\lambda\left(\mathbf{x}^T \mathbf{x}-1\right)$ 產生極值的必要條件是 $L$ 對 $\mathbf{x}$ 的各元的一次偏導數都等於零, 亦即 $\mathrm{x}$ 是 $L$ 的一個駐點。因為 $A^T=A$, 請讀者自行計算驗證 $\mathbf{0}=\nabla_{\mathbf{x}} L=2(A \mathbf{x}-\lambda \mathbf{x}) 。$ 單位向量 (unit vector) $\mathbf{x}$ 要使 $\mathbf{x}^T A \mathbf{x}$ 最大化的必要條件即為特徵方程式 $A \mathbf{x}=\lambda \mathbf{x}$, 將此式代人二次型可得 $\mathbf{x}^T A \mathbf{x}=\mathbf{x}^T(\lambda \mathbf{x})=\lambda\|\mathbf{x}\|^2=\lambda^{\circ}$ 實對稱矩陣的特徵值必為實數, 因此使二次型最大化的向量 $\mathbf{x}$ 正是對應最大特徵值的特徵向量。

應用 2: 估計 Hermitian 矩陣最大特徵值的下界和最小特徵值的上界

Rayleigh 定理的幾何意義是如果限制 $\mathbf{x}$ 為單位向量, $\lambda_1$ 和 $\lambda_n$ 給出二次型 $\mathbf{x}^* A \mathbf{x}$ 的最大值與最小值。考慮 $\mathbf{e}i=(0, \ldots, 0,1,0, \ldots, 0)^T$, 其第 $i$ 個元素為 1 , 則 $\mathbf{e}_i^* A \mathbf{e}_i=a{i i}{ }^{\circ}$ Rayleigh 定理有這個必然結果: Hermitian 矩陣 $A=\left[a_{i j}\right]$ 的任意主對角元也落在 $\lambda_1$ 和 $\lambda_n$ 之間，即 $\lambda_n \leq a_{i i} \leq \lambda_1$ 。利用此性質可約略估計 Hermitian 矩陣最大特徵值的下界和最小特徵值的上界。另外 $tr(A) = \sum_k^n \lambda_k$.

例如 $A=\left[\begin{array}{lll} 1 & 2 & 3 \\ 2 & 5 & 4 \\ 3 & 4 & 9 \end{array}\right]$ 不需經過計算也可推知 $A$ 的最大特徵值不小於 9 ，最小特徵值不大於 1 。eigenvalues: (12.6, 2.5, -0.1)

應用 3: Wely Theorem

對於兩個 $n \times n$ 的 Hermitian matrix (or real symmetric matrix) $A$ and $B$:

$\lambda_k(A) + \lambda_n(B) \le \lambda_k (A+B) \le \lambda_k(A) + \lambda_1(B)$ 有用的結論

兩個 Hermitian matrixes 的 eigenvalues 範圍
一個 Hermitian matrix 加上一個 semi-definite matrix (所有 eigenvalues 大於等於 0), eigenvalues 必增大

應用 4: Variation Method (Why Variation?)

Let $A, B \in \mathbb{R}^{n \times n}, A^T=A, B^T=B>0$ and $\lambda_1 \leq \lambda_2 \leq \ldots \lambda_n$ be the eigenvalues of $A u=\lambda B u$ with corresponding eigenvectors $u_1, u_2, \ldots, u_n$,

then

\[\min _{x} \frac{x^T A x}{x^T B x}=\lambda_1, \quad \arg \min _{x} \frac{x^T A x}{x^T B x}=u_1 .\]

and

\[\min _{x^T B u_1=0} \frac{x^T A x}{x^T B x}=\lambda_2, \quad \arg \min _{x^T B u_1=0} \frac{x^T A x}{x^T B x}=u_2 .\]

$A u=\lambda B u \to (A-\lambda B)u = 0 \to B^{-1}(A-\lambda B)u = 0 \to B^{-1}A u - \lambda u = 0$
- eigenvalues and eigenvectors 等價 $B^{-1} A$ 的 eigenvalues and eigenvectors.
- $B = I$ 就是 Courant-Fischer 定理。這是一個推廣。
For the matrix pair $(L, D)$, it is known that $\left(\lambda_1, y_1\right)=(0,1)$.
By the variational principle, the relaxed minimal Ncut (2) is equivalent to finding the second smallest eigenpair $\left(\lambda_2, y_2\right)$ of

\[L y=\lambda D y\]

Remarks:

$L$ is extremely sparse and $D$ is diagonal;
Precision requirement for eigenvectors is low, say $\mathcal{O}\left(10^{-3}\right)$.

Graph and Eigenvalue

Posted on 2023-01-28 | In Language

Main Reference

圖和矩陣 Graph and (Adjacency) Matrix

所有 graph 的定義都包含 $G(V, E)$：V 是頂點或節點 (Vertex)，E 是邊 (Edge)。Edge 可以是無方向的稱爲無向圖 (undirected graph). 或是有方向的稱爲有向圖 (directed graph). 甚至混合的稱爲 mixed graph.

下圖我們用

表示節點數目，

表示邊的數目。

不過更重要的是所有的 graphs 都可以用 matrix 表示。這很重要，因爲可以用 linear algebra on graph.

最常用的是鄰接矩陣 (adjacency matrix), $A$：任何相連的 vertex 對應的 edge 為 1, 其餘為 0. 所以也稱爲 connection matrix. Edge 可以是無方向，也可以是有方向的。如果 edge 無方向稱爲無向圖 (undirected graph)，如果 edge 有方向就稱爲有向圖 (directed graph)。
- 無向圖 (undirected graph) 的鄰接矩陣是對稱矩陣 symmetric matrix，一般 trace = 0. (見上圖)
- 有向圖 (directed graph) 的鄰接矩陣一般是非對稱矩陣 symmetric matrix 且 trace $\ne 0$. (見上圖)

另一個是 degree matrix, $D$：每個節點的 edge number. 是 diagonal matrix, 一定是對稱 matrix. (見下圖)

每個頂點的度 (degree of a vertex) : $d(v_i) = \sum_{j=1}^n a_{ij}$
$D = D(G) = diag(d(v_1), d(v_2), …, d(v_n))$

$D$ 的 trace = $\sum_{i=1}^n d(v_i) = \sum_{i=1}^n \sum_{j=1}^n a_{ij} = 2

$, 因爲所有 edge 都被計算兩次。

還有非常重要的 Laplacian matrix：$L = D - A$
- $L$ 每個列向量的和為 0, 所以 $L$ (1) not full rank; (2) 至少一個 eigenvalue = 0; (3) determinant $\det(L) = 0$
- $D-A$ (而非 $A-D$) **的 trace = 2 E ** > 0! 才能得到大於等於 0 的 eigenvalues

Simple Graph Adjacency Matrix

Laplacian Matrix 物理意義

$x^T L x = \frac{1}{2} \sum_{i,j=1}^n a_{ij} (x_i-x_j)^2$ for $\forall x \in \R^n$

證明：$\frac{1}{2} \sum_{i,j=1}^n a_{ij} (x_i-x_j)^2 = \sum_{i=1}^n a_{ii} x_i^2 - \sum_{i\ne j} a_{ij} x_i x_j = x^T L x = x^T (D - A) x$
2D 例子 :

\[\begin{aligned} x^T L x &= [x_1, x_2] \left[\begin{array}{cc} d_1 & -a_{12}\\ -a_{21} & d_2\end{array}\right] \left[\begin{array}{cc} x_1 \\ x_2 \end{array}\right] \\ &= \left[\begin{array}{cc} x_1 d_1 - x_2 a_{21} & x_2 d_2 - x_1 a_{12} \end{array}\right] \left[\begin{array}{cc} x_1 \\ x_2 \end{array}\right] \\ &= d_1 x_1^2 + d_2 x_2^2 - (a_{12}+a_{21}) x_1 x_2 \end{aligned}\]

2D Symmetric Laplacian Matrix：$L = \left[\begin{array}{cc} 1 & -1\ -1 & 1\end{array}\right]$ $d_1 = a_{12} = d_2 = a_{21} =1 $ $\begin{aligned}x^T L x &= d_1 x_1^2 + d_2 x_2^2 - (a_{12}+a_{21}) x_1 x_2 \\&= [a_{12} (x_1 - x_2)^2] = \frac{1}{2}[a_{12} (x_1 - x_2)^2 + a_{21} (x_2 - x_1)^2 ] \\&= \frac{1}{2} \sum_{i,j=1}^2 a_{ij} (x_i - x_j)^2 \end{aligned}$
3D 例子 :

\[\begin{aligned} x^T L x &= [x_1, x_2, x_3] \left[\begin{array}{cc} d_1 & -a_{12} & -a_{13}\\ -a_{21} & d_2 & -a_{23} \\ -a_{31} & -a_{32} & d_3\end{array}\right] \left[\begin{array}{cc} x_1 \\ x_2 \\ x_3\end{array}\right] \\ &= \left[\begin{array}{cc} x_1 d_1 - x_2 a_{21} - x_3 a_{31} & x_2 d_2 - x_1 a_{12} - x_3 a_{32} & x_3 d_3 - x_1 a_{13} - x_2 a_{23} \end{array}\right] \left[\begin{array}{cc} x_1 \\ x_2 \\ x_3 \end{array}\right] \\ &= d_1 x_1^2 + d_2 x_2^2 + d_3 x_3^2 - (a_{12}+a_{21}) x_1 x_2 - (a_{13}+a_{31}) x_1 x_3 - (a_{23}+a_{32}) x_2 x_3 \end{aligned}\]

3D Symmetric Laplacian Matrix：$d_1 = a_{12} + a_{13} ; d_2 = a_{21} + a_{23} ; d_3 = a_{31} + a_{32}$ and $a_{ij} = a_{ji}$ $\begin{aligned}x^T L x &= d_1 x_1^2 + d_2 x_2^2 + d_3 x_3^2 - (a_{12}+a_{21}) x_1 x_2 - (a_{13}+a_{31}) x_1 x_3 - (a_{23}+a_{32}) x_2 x_3 \\ &= a_{12} (x_1 - x_2)^2 + a_{13} (x_1 - x_3)^2 + a_{23} (x_2 - x_3)^2 \\&= \frac{1}{2} \sum_{i,j=1}^3 a_{ij} (x_i - x_j)^2 \end{aligned}$

Graph Spectrum 觀念 (Eigenvalues)

圖的特徵值 Graph‘s Eigenvalue

先提示一些 eigenvalue 相關的定理

用 2x2 matrix 為例子

Eigenvalue 和 eigenvector 加入作爲代數的判斷，定義如下： $\left[\begin{array} {cc}a_{11} & a_{12}\\a_{21} & a_{22}\end{array}\right] \left[\begin{array} {cc}v_{1} \\v_{2}\end{array}\right]= \lambda \left[\begin{array} {cc}v_{1} \\v_{2}\end{array}\right]$

\[(a_{11}-\lambda)(a_{22}-\lambda) - a_{21}a_{12} = 0 \\ \lambda^2 - (a_{11}+a_{22})\lambda + (a_{11} a_{22}-a_{12}a_{21}) = 0 \\ \lambda = \frac{a_{11}+a_{22}\pm \sqrt{(a_{11}+a_{22})^2-4(a_{11} a_{22}-a_{12}a_{21})}}{2} \\ \lambda = \frac{a_{11}+a_{22}\pm \sqrt{(a_{11}-a_{22})^2+4 a_{12}a_{21}}}{2}\]

相似矩陣

$P^{-1} A P = B$ 若 $P$ 是 invertable, $A, B$ 稱爲相似矩陣 (similar matrix).
- Similar matrix $A, B$ 有相同 (1) rank; (2) eigenvalues (but not eigenvector!); (3) determinant.
可以證明：$A$ (nxn) 和 $A^T$ (nxn) 是相似矩陣：有相同的 rank, eigenvalues (but not eigenvectors!), determinant.
Eigenvalue decomposition: $A = Q D Q^{-1}$, 所以 $A, D$ 是相似矩陣：相同 rank, eigenvalues, determinant.
- 如果 $A$ 是對稱 matrix, 所有 eigenvalues 都是實數。所有 eigenvectors 都正交 $Q^{-1} = Q^T$
- Trace theorem: 所有 eigenvalues 的和 = matrix trace, i.e. tr(A) = tr(D)

實數對稱矩陣 (Special Case of Complex Hermitian Matrix)

Eigenvalue decomposition (EVD, invariant basis) 和 Singular value decomposition (SVD, orthogonal basis) 的 eigen-values 和 singular values 以及 eigen-vectors 和 singular vectors 基本一樣，最多差一個正負號。
Eigenvalues 是實數, i.e. $\lambda_k^* = \lambda_k$ (非對稱矩陣可能有複數 eigenvalues)
Eigenvectors 是正交 vectors, i.e. $Q Q^* = I$, where $Q$ 的 column vectors 是 (right) eigenvectors, $u_k$.
- $Q = [u_1, u_2, …, u_n]$ and $Q Q^* = I \to u_i u_j^* = \delta_{ij}$
- $Q Q^* = I \to Q^{-1} = Q^* to A = Q D Q^{-1} = Q D Q^*$
Courant-Fischer min-max 定理，對於 $n \times n$ 的 Hermitian matrix (或是實數的對稱矩陣)：
- $x^* A x$ 一定是實數
- $\lambda_k$ 是 $k$-th 大的 eigenvalue, i.e. $\lambda_1 \ge \lambda_2 \ge \cdots \ge \lambda_n$
- $\lambda_k = \min_{dim(V)=n-k+1} \max_{x \in V} R = \min_{dim(V)=n-k+1} \max_{x \in V} \frac{x^* A x}{x^* x}$
- $\lambda_k = \max_{dim(V)=k} \min_{x \in V} R = \max_{dim(V)=k} \min_{x \in V} \frac{x^* A x}{x^* x}$
Wyle 定理
- 兩個 Hermitian matrixes 的 eigenvalues 範圍
- 一個 Hermitian matrix $A$ 加上一個 semi-definite matrix $B$ (all $\lambda_k \ge 0$), $A+B$ eigenvalues 必增大

Diagonal Dominant Matrix

最重要的結論：Laplacian matrix 是 weakly diagonal dominance : positive semi-definite.

Laplacian matrix 的所有 eigenvalues 都大於或等於 0.
Laplacian matrix 是 singular 因爲最小的 eigenvalue 為 0.

如果矩陣 $A$ 每個 row 的對角元素大於等於其他所有非對角元素和，稱爲 diagonally dominant matrix.

Weakly (有等號) diagonal dominance: $ a_{ii} \ge \sum_{i\ne j} a_{ij} $ for all $i$
Strictly (沒等號) diagonal dominance: $ a_{ii} > \sum_{i\ne j} a_{ij} $ for all $i$
例一：$\left[\begin{array} {cc}3 & -2 & 1 \ 1 & -3 & 2 \-1 & 2 & 4\end{array}\right],$ Strick diagonal dominance.
例二：Laplacian matrix $\left[\begin{array}{cc} d_1 & -a_{12} & -a_{13}\ -a_{21} & d_2 & -a_{23} \ -a_{31} & -a_{32} & d_3\end{array}\right]$, Weak diagonal dominance.

例三：band matrix $\left[\begin{array}{cc} d_1 & a_{12} & 0\ a_{21} & d_2 & a_{23} \ 0 & a_{32} & d_3\end{array}\right]$, 如果 $

d_i

> \sum_j a_{ij}$

Gershgorin’s circle theorem:
- Strictly diagonal dominance (Laplacian matrix 不屬於此例)：non-singular.
- Hermitian (or real symmetric) and strictly diagonal dominance (Laplacian matrix 不屬於此例) : positive definite.
- Hermitian (or real symmetric) and weakly diagonal dominance (Laplacian matrix 屬於此例) : positive semi-definite.

全正矩陣

Perron-Frobenius theorem: for all positive element matrix, the dominant (largest) eigenvalue is bounded between the lowest sum of a row the biggest sum of a row. 並且 dominant eigenvalue 對應全正值 (或全負值 x -1) 的 eigenvector.

無向圖的特徵值 Undirected Graph’s Eigenvalue

無向圖 (undirected graph) 的鄰接矩陣 $A$ 是對稱矩陣 symmetric matrix 且 Trace = 0，所以：

所有 eigenvalues 都是實數，所有的 eigenvectors 都是正交 vectors。
最大的 eigenvalue 介於最大和最小的 degree 之間 (Perron-Frobenius theorem)。上圖 6 節點鄰接矩陣的最大 eigenvalue: $1 \le \lambda_{max} \le 3$. 所有的 eigenvalues: (2.54, 1.08, 0.26, -0.54, -1.2, -2.13), 總和 0 (=trace).
一般無向圖比較少提自環 (self-loop, i loop to i), 所以 trace = 0。eigenvalues 和為 0. 所以其他的 eigenvalue 存在負數。這和 Laplacian matrix 的 eigenvalues $\ge 0$ 不同。

無向圖的 Laplacian matrix $L$ 也是對稱 matrix，Trace = 2 *|E| > 0，row vectors = 0，所以：

所有 eigenvalues (稱為 spectrum) 都是正實數。最小的 eigenvalue = 0，對應的 eigenvector = [1, .., 1]’. (可以直接驗證)
$\lambda_{n−1} \ge …\ge \lambda_{1} \ge \lambda_{0}=0$ 下圖的 Laplacian eigenvalues: (4.89, 3.70, 3, 1.68, 0.72, 0), 總和 14 (=trace).
[1,1,…,1]’ 對應 $Ax=\lambda_0 x$ where $\lambda_0=0$.
比較有意思的是 $\lambda_1$ 對應最少 cut 的 graph partition. 如果 $\lambda_1=0$ 代表 graph 有不連通的 subgraph.
- 如果兩個 subgraphs 不連通，Laplacian $\lambda_1 = \lambda_0 =0$。兩個 subgraphs 對應的 eigen-vectors [111100] 和 [000011] 對應兩個 0 eigenvalues. 依次類推多個不連通 subgraphs.
因為 minimize yLy’, where y = (1, 1, .. -1, -1)? (To be checked!)

有向圖的特徵值 Directed Graph’s Eigenvalue

有向圖 (directed graph) 的鄰接矩陣 $A$ 一般是非對稱矩陣且 Trace $\ge$ 0，如下圖的三個例子

非對稱矩陣的 eigenvalues 不一定是實數。但非對稱鄰接矩陣所有 elements 都是正值，仍然適用 Perron-Frobenius theorem.
- 最大的 eigenvalue 是實數，介於最大和最小的 degree 之間。其對應的 eigenvector 可以全為正值。
- eigenvalue 總和 = trace.
下圖的三個 adjacency matrix A, B, C 的 eigenvalue $1 \le \lambda_{max} \le 2$.
- A 的 eigenvalues (1.62, -0.62), 總和 1.
- B 的 eigenvalues (1.62, 0.5+0.87i, 0.5-0.87i, -0.62), 總和 2.
- C 的 eigenvalues (1.62, -0.5+0.87i, -0.5-0.87i, -0.62), 總和 0.

有向圖比較少但可以定義 Laplacian matrix。分成 in-degree Laplacian 和 out-degree Laplacian, 見下圖。
- Out-degree 是每個節點的 out-link 數目，所以 Out-Degree Laplacian 的 row vector 為 0。
- In-degree 是每個節點的 in-links 數目，所以 In-Degree Laplacian 的 column vector 為 0。
- 一定有 eigenvalue = 0, 但和無向圖的 Laplacian 一樣特性嗎? probably not, TBC.
- 下例：
  - 鄰接矩陣的 eigenvalues (1.32, -0.66+0.56i, -0.66-0.56i), 總和 0.
  - Out-degree Laplacian 的 eigenvalues (2, 2, 0), 總和 4.
  - In-degree Laplacian 的 eigenvalues (2, 2, 0), 總和 4. 和 out-degree 一樣。

權重圖和權重矩陣 Weight Matrix Vs. Adjacency Matrix

我們可以延伸 graph $G(V, E)$ 的定義從 (hard) edge (邊), $E = {(v_i, v_j)

a_{ij} \in 0, 1}$; 變成 (soft) link (連結), $E = {(v_i, v_j)

w_{ij} \ge 0 }$. (Continuous or discrete) weights 擴大了 graph 的應用！

$V = {v_i}$ 是頂點的集合。$ V = n$ 代表節點數目，相當 graph 的大小。
$W \in \R^{n\times n}$ 稱爲權重矩陣：$w_{ij} = \begin{cases} w_{ij} \ge 0 & \text{ if } i \neq j \ 0 & \text{ else } i = j \end{cases}$
如果 $w_{ij} \in {0, 1}$, $W = A$ 權重矩陣化簡成鄰接矩陣。
此時自然需要修正頂點的度 (degree of a vertex) 的定義和 Degree matrix.
- $d(v_i) = \sum_{j=1}^n w_{ij}$ degree of $v_i$
- $D = D(G) = diag( d(v_1), d(v_2), …, d(v_n))$
Laplacian matrix 需要被修改 $L = D - A \to L = D - W$, 基本特性都一樣！
- 所有 eigenvalues 都是正實數：$\lambda_{n−1} \ge …\ge \lambda_{1} \ge \lambda_{0}=0$. 最小的 eigenvalue $\lambda_0 = 0$，對應的 eigenvector = $u_0 = [1, .., 1]^T$, i.e. $L \mathbf{1}^T = \mathbf{0} $
- $x^T L x = \frac{1}{2}\sum_{i,j} w_{ij} (x_i - x_j)^2 \ge 0$

此外我們定義更多的東西，記得之前 $tr(D) = tr(L) = \sum_{i=1}^n d(v_i) = \sum_{i=1}^n \sum_{j=1}^n a_{ij} = 2

$, 現在改成體積！

Volume：對於 $V$ 的 subset $A, A \subseteq V$, 定義 $A$ 的 volume (體積)
- $vol(A) = \sum_{v_i \in A} d(v_i) = \sum_{v_i \in A} \sum_{j=1}^n w_{ij}$
- 如果 $A = V$, $vol(V) = \sum_{i=1}^n d(v_i) = tr(D) = tr(L) = \sum_{i=1}^n \sum_{j=1}^n w_{ij}$
  - 如果 $w_{ij} = a_{ij}$， $vol(V) = 2 E $
- 所以 $vol(A)$ 就是部分的 trace of $D$ or $L$.
Links(A, B)，節點集之間的連接：Give two subsets of vertices $A, B \subseteq V$. 定義 $links(A, B) = \sum_{v_i \in A, v_j \in B} w_{ij}$
- A and B 不一定是互斥
- 因爲 $w_{ij} = w_{ji}$, $W$ 是 symmetric matrix, 所以 $links(A, B) = links(B, A)$
- $vol(A) = links(A, V) = \sum_{v_i \in A, v_j \in V} w_{ij} $
- 上例 $ links(A,V) = w_{12} + w_{13} + w_{31} + w_{32} +w_{34} = vol(A)$
Cut(A)：$cut(A) = links(A, V-A)$
- 上例 $ cut(A) = links(A, V-A) = w_{12} + w_{32} + w_{34}$
Assoc(A) : $assoc(A) = links(A, A)$
- 上例 $ assoc(A) = links(A, A) = w_{13} + w_{31}$
$vol(A) = cut(A) + assoc(A)$