Math AI - Expand Score Matching to Flow Matching

Flow $\phi_t$ = 連續的 (自己到自己) 座標變換

Flow $\phi_t$ 定義: $x_t \triangleq \phi_t\left(x_0\right)$, 顯然 $t=0 \to x_0 = \phi_0\left(x_0\right)$ $\phi_t(x)$ 必須可微分且可逆。

(重要) 同時 $x_t$ 也要滿足運動方程：$\frac{d x_t}{d t}=u_t(x_t)$ 所以會得到以下的方程式： $\frac{d \phi_t(x_0)}{d t}=u_t\left(\phi_t\left(x_0\right)\right)$ 因為 $\phi_t(x)$ 是自己到自己的座標變換，在某些情況，可以省掉 subscript.

$\frac{d \phi_t(x)}{d t}=u_t\left(\phi_t\left(x\right)\right)$ 有無窮多的 $\phi_t$ and $u_t$ 滿足 $p_0(x)$ 到 $p_1(x)$ 的分佈轉換。最直接而且簡單的就是 linear interpolation flow.

Linear flow: 從 $x_0$ 的角度，是從 $x_0$ 到 $x_1$ 是直線，Wrong! 因為這代表從 $x_0$ 開始，遇到一個固定, constant 的 vector field ($u_t(x)$). 這代表 output distribution 和 input distribution 一樣，最多是 mean shift! 這種固定 and deterministic vector field 沒有太大的用處。
- 即使同樣的 distribution with shift mean, 我們可以看到 $u_t(x)$ 也不是 constant vector.
退而求其次，是從 $x_1$ 的角度， given $x_1$，對應的 $x_0$ 是直線。另外不是一個 $x_0$ 而是一個 distribution. 其 conditional vector field 是固定, constant 值。

以上太抽象，我們看實際的例子。

Training ($t, x_0, x_1 \to u_t(x_t)$)

Flow matching 神奇三部曲如下： Method 1 (直接法): Global flow matching $\begin{aligned} &\mathcal{L}_{\mathrm{FM}}(\theta)=\mathbb{E}_{t, p_t(x)}\left\|v_t(x)-u_t(x)\right\|^2\\ \end{aligned}$

一般 $p_0 \sim N(0, I)$, 但是 $p_1$ 未知，所以 $p_t$ 也未知。
除非是非常簡單的 $p_1$，同時用 linear interpolation $x_t = t x_0 + (1-t) x_1$ , 可以直接計算 $u_t(x)$. Really, how? 我覺得還是要用 condition flow 的定義！

Method 2 (間接法): Conditional flow match: $\begin{aligned} &\mathcal{L}_{\mathrm{CFM}}(\theta)=\mathbb{E}_{t, q(x_1), p_t(x \mid x_1)}\left\|v_t(x)-u_t(x \mid x_1)\right\|^2,\\ \end{aligned}$

此時需要 $p_t(x\mid x_1)$，一般假設 Gaussian. 同時利用 linear interpolation $x_t \mid x_1 = t x_0 + (1-t) x_1$ , 應該可以導出這個 Gaussian 的 close-form，或是可以用來計算 conditional flow. 但是無法得到 marginal flow 因爲 $p_1$ 是未知。

注意

$\mathcal{L}{\mathrm{FM}} \ne \mathcal{L}{\mathrm{CFM}}$，但是 $\min\mathcal{L}{\mathrm{FM}} \equiv \min\mathcal{L}{\mathrm{CFM}}$
所以 $u_t(x) \ne u_t\left(x \mid x_1\right)$ => global flow 和 conditional flow 可能不一致？

Method 3 (間接間接法): 轉換成可以 sample 的 distribution, $t, x_1, x_0$

重點是如何假設 $\psi_t(x_0)$ 和 $x_0, x_1$ 的關係。最簡單就是 linear interpolation. $\begin{aligned} &\mathcal{L}_{\mathrm{CFM}}(\theta)=\mathbb{E}_{t, q(x_1), p(x_0)}\left\|v_t(\psi_t(x_0))-\frac{d}{d t} \psi_t\left(x_0\right)\right\|^2 \end{aligned}$
這個表示把 $p_0$ and $x_0$ 帶出來。不過和 method 2 應該等價。

Inferencing/Sampling ($x_0, u_t, t \to x_1$)

此時和 conditional flow $u_t(x\mid x_1)$ 完全無關，因爲我們沒有 $x_1.$

一旦有 $u_t(x)$ 或是其近似 $v_t(x)$，就可以 sample.

先從 $p_0$ randomly sample.
利用 $u_t(x)$ 可以逐步得到 $p_1$ 的 sample.

Compute Likelihood ($x_1, u_t, t \to x_0$)

應該是 sample 的反向。所以只要把 $u_t$ 反向就可以?

Conditional Vector Field: New!

再來最神奇的部分: deterministic 的 vector field $u_t(x)$ 也可以有 conditional on $x_1$ vector field! 也就是從 deterministic field 變成一個 conditional distribution 的期望值！

Interestingly, we can also define a marginal vector field, by “marginalizing” over the conditional vector fields in the following sense (we assume $p_t(x)>0$ for all $t$ and $x$ ):

$u_t(x)=\mathbb{E}_{x_1\sim p_{1\mid t}}[u_t(x \mid x_1)] =\int u_t(x \mid x_1) \frac{p_t(x \mid x_1) q(x_1)}{p_t(x)} d x_1$ where $\begin{aligned} p_{1\mid t} &= p(x_1\mid x_t) = \frac{p_t(x \mid x_1) q(x_1)}{p_t(x)}\\ \end{aligned}$

同樣我們看兩個 cases $t=0$ $\begin{aligned} u_0(x)&=\int u_0(x \mid x_1) \frac{p_0(x \mid x_1) q(x_1)}{p_0(x)} d x_1\\ &=\int u_0(x \mid x_1) q(x_1) d x_1\\ \end{aligned}$ For OT case, $u_0(x\mid x_1)=x_1-x$ 所以在 OT case, 每個 $x_0$ 都會先指向 $x_1$ 的平均值。 $\begin{aligned} u_0(x)&=\int u_0(x \mid x_1) q(x_1) d x_1 = \mathbb{E}[x_1]-x\\ \end{aligned}$ $t=1$ $\begin{aligned} u_1(x)&=\int u_1(x \mid x_1) \frac{p_1(x \mid x_1) q(x_1)}{p_1(x)} d x_1\\ &=\int u_1(x \mid x_1) \frac{\delta(x-x_1) q(x_1)}{p_1(x)} d x_1\\ &\approx u_1(x \mid x) = u_1(x)\\ \end{aligned}$

利用 correlation matrix

Step 1: Define Joint Distribution

The vector $\begin{bmatrix} \mathbf{x}_0 \ \mathbf{x}_t \end{bmatrix}$ is jointly Gaussian since $\mathbf{x}_t$ is a linear combination of $\mathbf{x}_0$ and $\mathbf{x}_1$. Compute its moments:

Means:
$\mathbb{E}[\mathbf{x}_0] = -\boldsymbol{\mu}, \quad \mathbb{E}[\mathbf{x}_t] = (1-t)(-\boldsymbol{\mu}) + t(\boldsymbol{\mu}) = (2t-1)\boldsymbol{\mu}.$
Covariances:
$\text{Cov}(\mathbf{x}_0) = \mathbf{I}, \quad \text{Cov}(\mathbf{x}_t) = (1-t)^2\mathbf{I} + t^2\mathbf{I} = \sigma_t^2 \mathbf{I}, \quad \sigma_t^2 = 2t^2 - 2t + 1.$
Cross-Covariance:
$\text{Cov}(\mathbf{x}_0, \mathbf{x}_t) = \mathbb{E}[(\mathbf{x}_0 + \boldsymbol{\mu})(\mathbf{x}_t - (2t-1)\boldsymbol{\mu})^\top] = (1-t)\mathbf{I},$
since $\mathbf{x}_1 - \boldsymbol{\mu}$ is independent of $\mathbf{x}_0 + \boldsymbol{\mu}$ and has zero mean.

Step 2: Apply Gaussian Conditioning Formula

For jointly Gaussian vectors $\begin{bmatrix} \mathbf{a} \ \mathbf{b} \end{bmatrix} \sim \mathcal{N}\left( \begin{bmatrix} \boldsymbol{\mu}a \ \boldsymbol{\mu}_b \end{bmatrix}, \begin{bmatrix} \Sigma{aa} & \Sigma_{ab} \ \Sigma_{ba} & \Sigma_{bb} \end{bmatrix} \right)$,
$\mathbb{E}[\mathbf{a} \mid \mathbf{b} = \mathbf{x}] = \boldsymbol{\mu}_a + \Sigma_{ab} \Sigma_{bb}^{-1} (\mathbf{x} - \boldsymbol{\mu}_b).$
Here, $\mathbf{a} = \mathbf{x}_0$, $\mathbf{b} = \mathbf{x}_t$, and:
$\boldsymbol{\mu}_a = -\boldsymbol{\mu}, \quad \boldsymbol{\mu}_b = (2t-1)\boldsymbol{\mu}, \quad \Sigma_{ab} = (1-t)\mathbf{I}, \quad \Sigma_{bb} = \sigma_t^2 \mathbf{I}.$

Step 3: Substitute and Simplify

$\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] = -\boldsymbol{\mu} + \left[(1-t)\mathbf{I}\right] \left[\sigma_t^2 \mathbf{I}\right]^{-1} (\mathbf{x} - (2t-1)\boldsymbol{\mu}).$
Since $\left[\sigma_t^2 \mathbf{I}\right]^{-1} = \frac{1}{\sigma_t^2} \mathbf{I}$:
$= -\boldsymbol{\mu} + \frac{1-t}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu}).$

Final Result

$\boxed{\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] = -\boldsymbol{\mu} + \dfrac{1-t}{\sigma_t^{2}} \left( \mathbf{x} - (2t-1)\boldsymbol{\mu} \right)}$

Intuition

The term $\frac{1-t}{\sigma_t^2}$ represents the regression coefficient adjusting for the correlation between $\mathbf{x}_0$ and $\mathbf{x}_t$. The expression linearly combines the prior mean $-\boldsymbol{\mu}$ with the deviation of $\mathbf{x}$ from the marginal mean $(2t-1)\boldsymbol{\mu}$, scaled by the relative variance contribution of $\mathbf{x}_0$ to $\mathbf{x}_t$.

例一：Gaussian-to-Gaussian (from Cambridge blog)

先定義兩個 2D Gaussian distributions, 以下的 $\mu=10$ $\begin{gathered} p_0=\mathcal{N}([-\mu, 0], I) \text { and } p_1=\mathcal{N}([+\mu, 0], I) \\ \end{gathered}$ Given:

Prior distribution $p_0$: $\mathcal{N}(\mathbf{x}; -\boldsymbol{\mu}, \mathbf{I})$,
Target distribution $p_1$: $\mathcal{N}(\mathbf{x}; +\boldsymbol{\mu}, \mathbf{I})$, OT (Optimal Transport, 就是線性內差)
Linear interpolation path: $\mathbf{x}_t = t \mathbf{x}_1 + (1-t) \mathbf{x}_0$.
Interpolation distribution $p_t$: $\mathcal{N}(\mathbf{x}; (2t-1)\boldsymbol{\mu}, \sigma^2_t\,\mathbf{I})$, mean 是內差，variance 是 scaled 平方和
Mean of $\mathbf{x}_t$: $\mathbb{E}[\mathbf{x}_t] = (1-t)\mathbb{E}[\mathbf{x}_0] + t\mathbb{E}[\mathbf{x}_1] = (1-t)(-\boldsymbol{\mu}) + t(\boldsymbol{\mu}) = (2t - 1)\boldsymbol{\mu}$
Covariance of $\mathbf{x}_t$: $\text{Cov}(\mathbf{x}_t) = (1-t)^2 \text{Cov}(\mathbf{x}_0) + t^2 \text{Cov}(\mathbf{x}_1) = (1-t)^2 \mathbf{I} + t^2 \mathbf{I} = \sigma_t^2 \mathbf{I}$ where $\sigma_t^2 = (1-t)^2 + t^2 = 2t^2 - 2t + 1$.

We want to compute the vector field $\mathbf{u}_t(\mathbf{x})$ that transports samples from $p_0$ to $p_1$.

找 $u_t(x)$ 對應 real-life 的 training

Method 1 利用 global flow $\begin{aligned} &\mathcal{L}_{\mathrm{FM}}(\theta)=\mathbb{E}_{t, p_t(x)}\left\|v_t(x)-u_t(x)\right\|^2\\ \end{aligned}$

假設 OT: $x_t =(1-t) x_0+t x_1$, 因為 $x_0, x_1$ 都是 Gaussians, 可以直接計算 $p_t(x)$

直接計算 $u_t$

另一個是 conditional flow

$\begin{aligned} &\mathcal{L}_{\mathrm{CFM}}(\theta)=\mathbb{E}_{t, q(x_1), p(x_0)}\left\|v_t(\psi_t(x_0))-\frac{d}{d t} \psi_t\left(x_0\right)\right\|^2 \end{aligned}$ —

Method 1: 直接計算 Vector Field, 不用 Condition VF.

其實還是用 condition flow 加上 OT, 只是直接計算 $u_t(x)$!

Key Insight:

The linear interpolation path is defined for pairs $(\mathbf{x}_0, \mathbf{x}_1)$, where $\mathbf{x}_0 \sim p_0$ and $\mathbf{x}_1 \sim p_1$. Lipman’s method uses the independent coupling, where $\mathbf{x}_0$ and $\mathbf{x}_1$ are sampled independently. The conditional vector field for a given pair is: $\mathbf{v}_t(\mathbf{x}_t \mid \mathbf{x}_0, \mathbf{x}_1) = \frac{d\mathbf{x}_t}{dt} = \mathbf{x}_1 - \mathbf{x}_0.$ The marginal vector field $\mathbf{u}_t(\mathbf{x})$ is the conditional expectation: $\mathbf{u}_t(\mathbf{x}) = \mathbb{E}_{p(\mathbf{x}_0, \mathbf{x}_1 \mid \mathbf{x}_t = \mathbf{x})} \left[ \mathbf{x}_1 - \mathbf{x}_0 \right].$

Derivation:

Marginal Distribution at Time $t$: (很直觀) The interpolation $\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$ with independent $\mathbf{x}_0 \sim \mathcal{N}(-\boldsymbol{\mu}, \mathbf{I})$ and $\mathbf{x}_1 \sim \mathcal{N}(+\boldsymbol{\mu}, \mathbf{I})$ gives: $\mathbf{x}_t \sim \mathcal{N}\left(\boldsymbol{\mu}_t, \sigma_t^2 \mathbf{I} \right),$ where $\boldsymbol{\mu}_t=(2t-1)\boldsymbol{\mu}$ and $\sigma_t^2 = (1-t)^2 + t^2 = 2t^2 - 2t + 1$.
Conditional Expectations (Appendix D, 最關鍵的一步): Using Gaussian conditioning, the posterior expectations given $\mathbf{x}_t = \mathbf{x}$ are: $\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] = -\boldsymbol{\mu} + \frac{1-t}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu})=-\boldsymbol{\mu} + \frac{1-t}{\sigma_t^2} (\mathbf{x} - \boldsymbol{\mu}_t),$ Check
- $t=1$, $\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}_1] = \mathbb{E}[\mathbf{x}_0] = -\boldsymbol{\mu}$ (因為 independent, variance to I)
- $t=0$, $\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}_0] = \mathbb{E}[\mathbf{x}_0\mid \mathbf{x}_0] = \mathbf{x}_0$ (因為 deterministic, $\mu_t=-\mu$, $\sigma^2_t=1$)

$\mathbb{E}[\mathbf{x}_1 \mid \mathbf{x}_t = \mathbf{x}] = +\boldsymbol{\mu} + \frac{t}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu}).$

Vector Field: $\mathbf{u}_t(\mathbf{x}) = \mathbb{E}[\mathbf{x}_1 - \mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] = \mathbb{E}[\mathbf{x}_1 \mid \mathbf{x}_t = \mathbf{x}] - \mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}].$ Substituting the expressions: $\mathbf{u}_t(\mathbf{x}) = \left[ +\boldsymbol{\mu} + \frac{t}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu}) \right] - \left[ -\boldsymbol{\mu} + \frac{1-t}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu}) \right].$ Simplifying: $\mathbf{u}_t(\mathbf{x}) = 2\boldsymbol{\mu} + \frac{2t-1}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu}).$ 如果非對稱 means, 應該和 mean difference 差有關。 $\mathbf{u}_t(\mathbf{x}) = \boldsymbol{\mu}_1-\boldsymbol{\mu}_0 + \frac{\dot{\sigma}_t^2}{2\sigma_t^2} (\mathbf{x} - \boldsymbol{\mu}_t).$ Further simplification yields: $\boxed{\mathbf{u}_t(\mathbf{x}) = \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{\sigma_t^2}, \quad \sigma_t^2 = 2t^2 - 2t + 1.}$ Check
- $t=0$, $\mathbf{u}_0(\mathbf{x}) = \boldsymbol{\mu} - \mathbf{x}_0 = \mathbb{E}(\mathbf{x}_1)-\mathbf{x}_0$ (因為 independent, 取 $x_1$ 平均作為 vector 終點)
- $t=1$, $\mathbf{u}_1(\mathbf{x}) = \boldsymbol{\mu} + \mathbf{x}_1 = -(\mathbb{E}(\mathbf{x}_0)-\mathbf{x}_1)$ (基本是上面的反向，所以多一個負號）

Final Closed-Form:

With $\boldsymbol{\mu} = [10, 0]$, the vector field is: $\mathbf{u}_t(\mathbf{x}) = \frac{1}{2t^2 - 2t + 1} \begin{pmatrix} (2t-1)x_1 + 10 \\ (2t-1)x_2 \end{pmatrix},$ where $\mathbf{x} = [x_1, x_2]^\top$.

Verification:

At $t=0$: $\mathbf{u}_0(\mathbf{x}) = [-x_1 + 10, -x_2]^\top$, which transports $\mathcal{N}([-10, 0]^\top, \mathbf{I})$ as expected.
At $t=1$: $\mathbf{u}_1(\mathbf{x}) = [x_1 + 10, x_2]^\top$, which transports to $\mathcal{N}([10, 0]^\top, \mathbf{I})$.
The continuity equation $\frac{\partial p_t}{\partial t} + \nabla \cdot (p_t \mathbf{u}_t) = 0$ holds for the Gaussian path.

\[\boxed{\mathbf{u}_{t}(\mathbf{x}) = \dfrac{1}{2t^{2} - 2t + 1} \begin{pmatrix} (2t - 1) x_{1} + 10 \\ (2t - 1) x_{2} \end{pmatrix}}\]

Validate the conservation of probability (Appendix E)!

$\boxed{\frac{d}{dt} \log p_t(\mathbf{x}) = - \nabla \cdot \mathbf{u}_t(\mathbf{x}) = -d \frac{2t-1}{\sigma_t^2} = - \frac{\mathbf{d}\dot{\sigma_t^2}}{2\sigma_t^2}=-\frac{\mathbf{d}}{2}\frac{ d}{dt}\log\sigma_t^2}$ 最後的 t < 0.5 d/dt > 0, 代表這個 flow 是壓縮。 t > 0.5 d/dt < 0, 代表這個 flow 是膨脹。

![[Pasted image 20250604104124.png]]

Next, I’ll derive the closed-form vector field using the Conditional Flow Matching (CFM) definition from Lipman et al. (2023), following the formula you provided:

Method 2: Derivation using CFM Framework

Given:

Prior distribution $p_0 = \mathcal{N}(-\boldsymbol{\mu}, \mathbf{I})$
Target distribution $p_1 = \mathcal{N}(+\boldsymbol{\mu}, \mathbf{I})$
Conditional vector field $u_t(x \mid x_1) = \frac{x_1 - x}{1-t}$ (linear interpolation path)
Conditional probability path $p_t(x \mid x_1) = \mathcal{N}(x; \; tx_1 - (1-t)\boldsymbol{\mu}, \;(1-t)^2\mathbf{I})$

The marginal vector field is: $u_t(x) = \mathbb{E}_{x_1 \sim p_{1|t}} \left[ u_t(x \mid x_1) \right] = \int u_t(x \mid x_1) \frac{p_t(x \mid x_1) p_1(x_1)}{p_t(x)} dx_1$

Step 1: Identify Components

$p_1(x_1) = \mathcal{N}(x_1; \boldsymbol{\mu}, \mathbf{I})$
$p_t(x \mid x_1) = \mathcal{N}(x; tx_1 - (1-t)\boldsymbol{\mu}, (1-t)^2\mathbf{I})$
$p_t(x) = \mathcal{N}(x; (2t-1)\boldsymbol{\mu}, (2t^2-2t+1)\mathbf{I})= \mathcal{N}(x; \boldsymbol{\mu}_t, \sigma^2_t \mathbf{I})$ (marginal distribution)
- $\boldsymbol{\mu}_t = (2t-1)\boldsymbol{\mu} , \,\,\sigma^2_t = 2t^2-2t+1$

The posterior is Gaussian: $\begin{align} p_{1|t}(x_1|x) &= \mathcal{N}\left( x_1; \frac{tx + (1-t)\boldsymbol{\mu}}{2t^2-2t+1}, \frac{(1-t)^2}{2t^2-2t+1} \mathbf{I}\right)\\ &= \mathcal{N}\left( x_1; \frac{tx + (1-t)\boldsymbol{\mu}}{\sigma^2_t}, \frac{(1-t)^2}{\sigma^2_t} \mathbf{I}\right)\\ \end{align}$

Step 2: Compute Expectation (Appendix C)

The conditional vector field should be: $\begin{aligned} u_t(x) & =\mathbb{E}_{x_1 \sim p_{1 \mid t}}\left[u_t\left(x \mid x_1\right)\right] \\ & =\int u_t\left(x \mid x_1\right) \frac{p_t\left(x \mid x_1\right) q_1\left(x_1\right)}{p_t(x)} \mathrm{d} x_1 . \end{aligned}$

因爲 Condition flow 是 Gaussian given $\mathbf{x}_1$ $\begin{aligned} u_t(x\mid x_1) &= u(x_t\mid x_1) = \frac{\psi_t(x_0)}{dt} = \frac{d x_t}{dt} = \dot{\sigma}_t(x_1) x_0 + \dot{\mu_t}(x_1)\\ &= \dot{\sigma}_t(x_1) \left[\frac{x_t - u_t(x_1)}{\sigma_t(x_1)}\right] + \dot{\mu_t}(x_1) \\ &= \frac{\dot{\sigma}_t(x_1)}{\sigma_t(x_1)} (x - \mu_t(x_1))+ \dot{\mu_t}(x_1) \\ \end{aligned}$ 但是 condition Gaussian flow on $x_1$ 和 uncondition Gaussian flow 不同。是一個像錐體如下圖。 ![[Pasted image 20250514121948.png]]

Uncondition Gaussian flow 則是像束腰的圓柱體。

![[Pasted image 20250604104124.png]]

所以雖然都是 $\boxed{\mathbf{u}_t(\mathbf{x}) = \dot{\boldsymbol{\mu}}_t + \frac{\dot{\sigma_t}}{\sigma_t} (\mathbf{x} - \boldsymbol{\mu}_t)}$

**Condition Vector Field: $u_t(\mathbf{x} \mid \mathbf{x}_1)$

给定终端值 $\mathbf{x}_1$，条件路径为： $\mathbf{x}_t \mid \mathbf{x}_1 = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$ 条件分布为高斯： $\mathbf{x}_t \mid \mathbf{x}_1 \sim \mathcal{N}\left( \boldsymbol{\mu}_t(\mathbf{x}_1), \sigma_t^2(\mathbf{x}_1) \mathbf{I} \right)$ 其中：

$\boldsymbol{\mu}_t(\mathbf{x}_1) = (1-t)\boldsymbol{\mu}_0 + t\mathbf{x}_1$ 因爲最後收斂到 $x_1$

$\sigma_t(\mathbf{x}_1) = (1-t)\sigma_0$（因为方差为 $(1-t)^2 \sigma_0^2$，标准差为 $

1-t

\sigma_0$，且 $t \in [0,1]$ 时 $1-t \geq 0$)

条件向量场由路径的导数给出： $\begin{align} \mathbf{u}_t(\mathbf{x} \mid \mathbf{x}_1) &= \dot{\boldsymbol{\mu}}_t(\mathbf{x}_1) + \frac{\dot{\sigma_t}(\mathbf{x}_1)}{\sigma_t(\mathbf{x}_1)} (\mathbf{x}_t - \boldsymbol{\mu}_t(\mathbf{x}_1)) \\ &= \mathbf{x}_1 - \boldsymbol{\mu}_0 + \frac{-\sigma_0}{(1-t)\sigma_0}(\mathbf{x}_t-\boldsymbol{\mu}_t(\mathbf{x}_1)) \\ &=\frac{(\mathbf{x}_1 - \boldsymbol{\mu}_0)(1-t)}{1-t} + \frac{-\mathbf{x}_t+(1-t)\boldsymbol{\mu}_0 + t\mathbf{x}_1}{1-t}\\ &=\frac{\mathbf{x}_1 -\mathbf{x}_t}{1-t} \\ \end{align}$ 另一個方法是直接用直綫假説 $\begin{align} \mathbf{u}_t(\mathbf{x} \mid \mathbf{x}_1) &= -\mathbf{x}_0 + \mathbf{x}_1 \end{align}$ 代入 $\mathbf{x}_0 = \frac{\mathbf{x}_t - t\mathbf{x}_1}{1-t}$： $\mathbf{u}_t(\mathbf{x} \mid \mathbf{x}_1) = -\frac{\mathbf{x}_t - t \mathbf{x}_1}{1-t} + \frac{\mathbf{x}_1 -t\mathbf{x}_1} {1-t} = \frac{\mathbf{x}_1 -\mathbf{x}_t} {1-t}$

**Uncondition Vector Field: $u_t(\mathbf{x})$

无条件向量场是条件向量场关于后验分布 $p(\mathbf{x}_1 \mid \mathbf{x}_t)$ 的期望： $\mathbf{u}_t(\mathbf{x}) = \mathbb{E}_{\mathbf{x}_1 \sim p(\mathbf{x}_1 \mid \mathbf{x}_t)} \left[ \mathbf{u}_t(\mathbf{x} \mid \mathbf{x}_1) \right]$

Since $\mathbf{u}_t(\mathbf{x} \mid \mathbf{x}_1) = \frac{\mathbf{x}_1 - \mathbf{x}}{1-t}$ is linear in $\mathbf{x}_1$, and the posterior is Gaussian: $\mathbf{u}_t(x) = \mathbb{E}_{\mathbf{x}_1 \sim p_{1\mid t}(\mathbf{x}_1|\mathbf{x})}\left[\frac{\mathbf{x}_1 - \mathbf{x}}{1-t} \right] = \frac{\overbrace{\mathbb{E}_{p_{1\mid t}}[\mathbf{x}_1 ]}^{\text{posterior mean}} - \mathbf{x}}{1-t}$

由于联合分布 $(\mathbf{x}_t, \mathbf{x}_1)$ 是高斯分布，后验 $p(\mathbf{x}_1 \mid \mathbf{x}_t)$ 也是高斯分布。计算其均值和协方差：

联合分布： $\begin{bmatrix} \mathbf{x}_t \\ \mathbf{x}_1 \end{bmatrix} \sim \mathcal{N}\left( \begin{bmatrix} \boldsymbol{\mu}_t \\ \boldsymbol{\mu}_1 \end{bmatrix}, \begin{bmatrix} \sigma_t^2 \mathbf{I} & t\sigma_1^2 \mathbf{I} \\ t\sigma_1^2 \mathbf{I} & \sigma_1^2 \mathbf{I} \end{bmatrix} \right)$
后验均值： $\mathbb{E}[\mathbf{x}_1 \mid \mathbf{x}_t] = \boldsymbol{\mu}_1 + \frac{t\sigma_1^2}{\sigma_t^2} (\mathbf{x} - \boldsymbol{\mu}_t) = \boldsymbol{\mu} + \frac{t(\mathbf{x} - (2t-1)\boldsymbol{\mu})}{2t^2-2t+1}$

代入期望： $\mathbf{u}_t(\mathbf{x}) = \left[\boldsymbol{\mu} + \frac{t(\mathbf{x} - (2t-1)\boldsymbol{\mu})}{2t^2-2t+1} - \mathbf{x}\right]/(1-t)=\frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{2t^2-2t+1}$

\[\boxed{\mathbf{u}_t(\mathbf{x}) = \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{2t^2-2t+1} = \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{\sigma^2_t}}\]

Uncondition Flow Geometric Interpretation

看下圖比較清楚：想像是一團 centered at (-10, 0) 的點，隨著時間往 (+10, 0) 移動的過程。綠色是 trace, 是每個點經過 vector field 被改變之後的 trace. 可以想像是微分方程的解。

$\frac{d\boldsymbol{x}_t}{dt} = \boldsymbol{u}_t(\boldsymbol{x_t})$

For $t < 0.5$:
The term $(2t-1)$ is negative, 所以是一個壓縮的流。
For $t > 0.5$:
The term $(2t-1)$ is positive, 所以是一個膨脹的流。

![[Pasted image 20250604104124.png]]

例二：General G2G with Scaled Indentity Covariance (Appendix E)

Under independent coupling:

$\mathbf{x}_0 \sim \mathcal{N}(\boldsymbol{\mu}_0, \sigma_0^2 \mathbf{I})$
$\mathbf{x}_1 \sim \mathcal{N}(\boldsymbol{\mu}_1, \sigma_1^2 \mathbf{I})$

With linear interpolation $\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$, the marginal is Gaussian: $\mathbf{x}_t \sim \mathcal{N}(\boldsymbol{\mu}_t, \sigma_t^2 \mathbf{I}),$ where:

Mean: $\boldsymbol{\mu}_t = (1-t)\boldsymbol{\mu}_0 + t\boldsymbol{\mu}_1$
Variance: $\sigma_t^2 = (1-t)^2 \sigma_0^2 + t^2 \sigma_1^2$

The form below holds generally for isotropic Gaussians $p_0 = \mathcal{N}(\boldsymbol{\mu}_0, \sigma_0^2 \mathbf{I})$ and $p_1 = \mathcal{N}(\boldsymbol{\mu}_1, \sigma_1^2 \mathbf{I})$ under independent coupling. 而且下式是座標無關形式！

$\boxed{\mathbf{u}_t(\mathbf{x}) = \dot{\boldsymbol{\mu}}_t + \frac{\dot{\sigma_t^{2}}}{2\sigma_t^{2}} (\mathbf{x} - \boldsymbol{\mu}_t)}$ 補充一下，因爲 $\dot{\sigma_t^{2}}=2 \sigma_t \dot{\sigma_t}$. 所以上式也可以寫成： $\mathbf{u}_t(\mathbf{x}) = \dot{\boldsymbol{\mu}}_t + \frac{\dot{\sigma_t^{2}}}{2\sigma_t^{2}} (\mathbf{x} - \boldsymbol{\mu}_t)= \dot{\boldsymbol{\mu}}_t + \frac{\dot{\sigma_t}}{\sigma_t} (\mathbf{x} - \boldsymbol{\mu}_t)$ 看起來更簡潔和直觀，但是對於 general form 會有一點 messy, 所以我們 keep both forms.

例一： $\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0 = 2\boldsymbol{\mu}$, ${\sigma}^2_t = 2t^2-2t+1$, $\dot{\sigma^2_t} = 2(2t-1)$, $\boldsymbol{\mu}_t = (2t-1)\boldsymbol{\mu}$, $\mathbf{d}=2$ (dimension), 帶入得到 $\boxed{\mathbf{u}_t(\mathbf{x}) = \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{2t^2-2t+1} = \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{\sigma^2_t}}$

同樣 conservation of probability 如下，也是座標無關形式！ $\boxed{\frac{d}{dt} \log p_t(\mathbf{x}) = - \nabla \cdot \mathbf{u}_t(\mathbf{x}) = -\frac{\mathbf{d}}{2}\frac{ d}{dt}\log\sigma_t^2=-\mathbf{d}\frac{ d}{dt}\log\sigma_t}$

例三：General G2G with Full Rank Covariance (Appendix G)

The vector field $u_t(x)$ for flow matching between two Gaussian distributions $p_0 = \mathcal{N}(\boldsymbol{\mu}_0, \boldsymbol{\Sigma}_0)$ and $p_1 = \mathcal{N}(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_1)$ is derived under a Gaussian probability path where the mean and covariance are linearly interpolated:

$\boldsymbol{\mu}_t = (1 - t) \boldsymbol{\mu}_0 + t \boldsymbol{\mu}_1$ $\boldsymbol{\Sigma}_t = (1 - t)^2 \boldsymbol{\Sigma}_0 + t^2 \boldsymbol{\Sigma}_1$

The vector field is given by:

$\mathbf{u}_t(x) = \underbrace{\dot{\boldsymbol{\mu}}_t}_{\text{Mean component}\,} + \underbrace{\frac{1}{2} \dot{\boldsymbol{\Sigma}_t} \boldsymbol{\Sigma}_t^{-1} ((\mathbf{x} - \boldsymbol{\mu}_t)}_{\text{Covariance component}}$ $\boxed{\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \dfrac{ t \boldsymbol{\Sigma}_1 - (1-t) \boldsymbol{\Sigma}_0 }{ \boldsymbol{\Sigma}_t } (\mathbf{x} - \boldsymbol{\mu}_t)}$

If $\boldsymbol{\Sigma}_0$ and $\boldsymbol{\Sigma}_1$ commute, the flow simplifies to: (Appendix H) $\boxed{\mathbf{x}(t) = \boldsymbol{\mu}_t + \boldsymbol{\Sigma}_t^{1/2} \boldsymbol{\Sigma}_0^{-1/2} \left( \mathbf{x}_0 - \boldsymbol{\mu}_0 \right)}$

例三：Full-Rank Gaussian-to-Low-Rank Gaussian (Appendix E)

![[Pasted image 20250607094307.png]] 我們假設 $p_1 \sim N(\mu_1, \Sigma_{min})$ and $p_0 \sim N(\mu_0, \Sigma_{max})$

$x + 1 \over \sqrt{1 - x^2} \tag{2}$ $E = mc^2$ ^energy-eq

As shown in equation [[#^energy-eq]], …

假設 $X_1 \in$ {”貓”, “狗“} 的機率為 10%, 90%.
u(x_t) 在 t =0 是各 50%, 50%. 但是到 t = 1 是 10%, 90%, 那在 t 中間如何?

u_t = u(x_t

x_1=”貓”) *

Why Conditional Flow Matching?

Why conditional vector field? 因為 flow matching 是 sampling from $p_t(x)$，但是 conditional flow matching 可以從 data $q(x_1)$ sampling 來 training.

但是還是有 $p_t(x\mid x_1)$ 才能 training? 如同 diffusion 的 transition probability: 老把戲是 Gaussian. $p_t(x\mid x_1)\sim N(\mu_t(x_1), \sigma^2_t(x_1) I)$
這個 Gaussian equivalently! $x_{t\mid 1} = \mu_t(x_1) + \sigma_t(x_1) \cdot z, \quad z\sim N(0, I)$
因為 $z$ 和 $x_0$ 一樣 $N(0, I)$，也可以改成：$x_{t\mid 1} = \mu_t(x_1) + \sigma_t(x_1) x_0$
再因為 $x_{t\mid 1} = \phi(x_0\mid x_1)=\psi(x_0)$, 所以也可以寫：$\psi(x_0) = \mu_t(x_1) + \sigma_t(x_1) x_0$

對應的 conditional vector field 是： $u_t(x\mid x_1) = \frac{\psi_t(x_0)}{dt} = \frac{d x_t}{dt} = \dot{\sigma}_t(x_1) x_0 + \dot{\mu_t}(x_1)$ 同樣比較好的 conditional vector field 寫法是:

$u_t(x\mid x_1) = u(x_t\mid x_1) = \frac{\psi_t(x_0)}{dt} = \frac{d x_t}{dt} = \dot{\sigma}_t(x_1) x_0 + \dot{\mu_t}(x_1)$ 一般用 $x_0$ 比較好，因爲可以用來 sampling from $\mathcal{N}(0, I)$ 做 flow matching training!

但也可以把 $x_0$ 換成 $x_t$ 用上面的 Gaussian，如此得到 instant conditional vector field at $t$. $\begin{aligned} u_t(x\mid x_1) &= u(x_t\mid x_1) = \frac{\psi_t(x_0)}{dt} = \frac{d x_t}{dt} = \dot{\sigma}_t(x_1) x_0 + \dot{\mu_t}(x_1)\\ &= \dot{\sigma}_t(x_1) \left[\frac{x_t - u_t(x_1)}{\sigma_t(x_1)}\right] + \dot{\mu_t}(x_1) \\ &= \frac{\dot{\sigma}_t(x_1)}{\sigma_t(x_1)} (x - \mu_t(x_1))+ \dot{\mu_t}(x_1) \\ \end{aligned}$ Condition vector field 的形式和 uncondition vector field 一樣！但是是 Condition vector field 是 given $x_1$, 所以值完全不同！

[!NOTE] 如果是 OT, $u_t(x\mid x_1)$ 就是直綫的斜率:
Since $u_t(x \mid x_1) = \frac{x_1 - x}{1-t}$ is linear in $x_1$, and the posterior is Gaussian:

上式的好處是如果我們已經知道 $p_t$ 在時間和空間的分佈, i.e. $p(x, t)$ from Fokker-Planck equation，可以直接轉換成 flow!! $p_t(x\mid x_1)\sim N(\mu_t(x_1), \sigma^2_t(x_1) I)$

這個 conditional flow 用圖看比較清楚。最後 reach $x_1=X_1$, 從一個 fat initial condition ($\sigma_t(x_1)$ 隨時間變小)，但是最終收斂到 $\mu_t(x_1)=X_1$ ![[Pasted image 20250514121948.png]]

如何 sample：$\psi(x_0) = \mu_t(x_1) + \sigma_t(x_1) x_0$ and $\frac{d}{dt}\psi_t(x_0)$?

$t \sim [0, 1]$. $x_1$ 是直接從 data set sample 的 image $q(x_1)$. $x_0 \sim N(0, I)$ 也非常簡單。

Sampling (from $x_0$ and $u_t$ to get $x_t$)

最重要的是 $x_t$不是直線！因為 $\mathbf{u}_t(\mathbf{x}_t)$ 是一個平均的結果，不是一個 constant vector! 但是 condition vector 是直線 (in the OT case).

理論上非常簡單，就是解一個 ODE with initial condition $\mathbf{x}_0$ is: $\frac{d\mathbf{x}}{dt} = \mathbf{u}_t(\mathbf{x})$ The exact solution for 例一 of the ODE $\mathbf{x}_t = (2t-1)\boldsymbol{\mu} + \sqrt{2t^2 - 2t + 1} \cdot (\mathbf{x}_0 + \boldsymbol{\mu})$

At $t=0$: $\mathbf{x}_0 = -\boldsymbol{\mu} + \mathbf{z}$ (where $\mathbf{z} = \mathbf{x}_0 + \boldsymbol{\mu} \sim \mathcal{N}(0, \mathbf{I})$).
At $t=1$: $\mathbf{x}_1 = \boldsymbol{\mu} + \mathbf{z} = \mathbf{x}_0 + 2\boldsymbol{\mu}$ (a sample from $p_1$).

因爲 flow $\phi_t(\mathbf{x}_0) = \mathbf{x}_t$ $\phi_t(\mathbf{x}_0) =\mathbf{x}_t = (2t-1)\boldsymbol{\mu} + \sqrt{2t^2 - 2t + 1} \cdot (\mathbf{x}_0 + \boldsymbol{\mu})$ The path (flow $\phi(x_0)$) is curved due to the $\sqrt{2t^2 - 2t + 1}$ term, which is nonlinear in $t$.

通用的表示：（Two indepedent Gaussians, Appendix E and F) $\boxed{\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{ {d\log\sigma}_t^{2}}{2 dt} (\mathbf{x} - \boldsymbol{\mu}_t)}$ $\begin{aligned} \mathbf{x}(t) &= \left[(1-t) \boldsymbol{\mu}_0 + t \boldsymbol{\mu}_1\right] + \dfrac{\sqrt{(1-t)^2 \sigma_0^2 + t^2 \sigma_1^2}}{\sigma_0} (\mathbf{x}_0 - \boldsymbol{\mu}_0)\\ \phi_t(\mathbf{x}_0) &= \mathbf{x}(t)=\boldsymbol{\mu}_t + \dfrac{\sigma_t}{\sigma_0} (\mathbf{x}_0 - \boldsymbol{\mu}_0)\\ \end{aligned}$

$\boldsymbol{\mu}_t = (1-t) \boldsymbol{\mu}_0 + t \boldsymbol{\mu}_1 = \boldsymbol{\mu}_0 + t(\boldsymbol{\mu}_1-\boldsymbol{\mu}_0)$ 是以 $\boldsymbol{\mu}_0$ 為起點，斜率為 $(\boldsymbol{\mu}_1-\boldsymbol{\mu}_0)$ 的直綫。只有在$\mathbf{x}_0 = \boldsymbol{\mu}_0$ 才會走這條直綫。當 $\mathbf{x}_0 \ne \boldsymbol{\mu}_0$ 偏離的部分就會照 standard deviation 比例 ($\frac{\sigma_t}{\sigma_0}$) 加到這條直綫。
$t =0, \phi_0(x_0) = x_0$
$t =1, \phi_1(x_0) = \mu_1 + \frac{\sigma_1}{\sigma_0}(x_0-\mu_0)$. 如果 $\sigma_1=\sigma_0=\sigma$ , $\phi_1(x_0) = x_0+(\mu_1-\mu_0)$. 即是所有的終點都是起點加上 mean difference.

General Expression

$\boldsymbol{\mu}_t = (1 - t) \boldsymbol{\mu}_0 + t \boldsymbol{\mu}_1$ and $\boldsymbol{\Sigma}_t = (1 - t)^2 \boldsymbol{\Sigma}_0 + t^2 \boldsymbol{\Sigma}_1$ If $\boldsymbol{\Sigma}_0$ and $\boldsymbol{\Sigma}_1$ commute, the flow simplifies to: (Appendix H) $\boxed{\phi_t(\mathbf{x}_0)=\mathbf{x}(t) = \boldsymbol{\mu}_t + \boldsymbol{\Sigma}_t^{1/2} \boldsymbol{\Sigma}_0^{-1/2} \left( \mathbf{x}_0 - \boldsymbol{\mu}_0 \right)}$

Example: $\mathbf{x}_0 = [-10, 1]^\top$

Target: $\mathbf{x}_1 = \mathbf{x}_0 + 2\boldsymbol{\mu} = [10, 1]^\top$.
Trajectory:
$\mathbf{x}_t = \begin{bmatrix} 20t - 10 \\ \sqrt{2t^2 - 2t + 1} \end{bmatrix}$
Positions:
- $t=0$: $[-10, 1]^\top$
- $t=0.5$: $[0, \sqrt{0.5}]^\top \approx [0, 0.707]^\top$
- $t=1$: $[10, 1]^\top$.

Why Straight Lines Do Not Occur:

Independent coupling:
- For a fixed pair $(\mathbf{x}_0, \mathbf{x}_1)$, the conditional path is straight: $\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$.
- However, $\mathbf{u}_t(\mathbf{x})$ is the marginal field (average over all $\mathbf{x}_1$), so individual paths curve to reconcile all possible endpoints.
Geometry:
- Straight lines require $\frac{d^2\mathbf{x}_t}{dt^2} = 0$. Here, acceleration is nonzero:
  $\frac{d\mathbf{u}_t}{dt} \neq 0 \implies \text{curved paths}.$

Conclusion:

Wrong. Under the given marginal vector field $\mathbf{u}_t(\mathbf{x})$:

Samples from $p_0$ travel to $\mathbf{x}_0 + 2\boldsymbol{\mu}$ (a valid sample from $p_1$).
The trajectory is not a straight line unless $\mathbf{x}_0 = -\boldsymbol{\mu}$ (mean of $p_0$).
Curved paths arise from the independent coupling, where the vector field averages over all possible $\mathbf{x}_1$.

Key Takeaway: The marginal flow matches the distributions $p_0 \to p_1$ but follows curved trajectories. For straight lines, use conditional flow matching (Lipman et al.) with paired samples $(\mathbf{x}_0, \mathbf{x}_1)$.

其實最後關鍵就是如何選兩個參數

$\mu_t(x_1), \sigma_t(x_1)$ and $\dot{\mu}_t(x_1), \dot{\sigma}_t(x_1)$
with boundary condition

$\mu_1(x_1) = x_1$, $\sigma_1(x_1)=\sigma_{min}$
$\mu_0(x_1) = 0$, $\sigma_0(x_1)=1$

我們看一些例子。

Example I: Optimal Transport (OT) conditional VF (Vector Field)

最簡單就是線性內差： $\mu_t(x_1) = t x_1$, $\sigma_t(x_1) = 1-(1-\sigma_{min})\,t$ $\psi_t(x_0) = x_t = t x_1 + (1-(1-\sigma_{min})t) x_0$

對應的 condition vector field，物理意義非常簡單，就是一個 constant field 和 sampled 的 $t$ 無關！而且是 $x_0$ 和目標的$x_1$ 的向量差！就是一路直衝終點！ Wrong, 我們不知道 $x_1$! $\frac{\psi_t(x_0)}{dt} = \frac{d x_t}{dt} = x_1 - (1-\sigma_{min})x_0\approx x_1 - x_0$

如果以 $x_t$ local or instant 角度的 conditional vector field: $\begin{aligned} u_t(x\mid x_1) = u(x_t\mid x_1) &= x_1 + (1-\sigma_{min}) \frac{x_t - t x_1}{1-(1-\sigma_{min})t}\\ &= \frac{x_1 - (1-\sigma_{min}) x_t}{1-(1-\sigma_{min})t}\\ &= \frac{x_1 - (1-\sigma_{min}) x}{1-(1-\sigma_{min})t}\\ \end{aligned}$

OT Summary $t = 1$ BC (boundary condition): $\mu_1(x_1) = x_1$, $\sigma_1(x_1)=\sigma_{min}$ conditional flow: $\psi_1(x_0) = x_1 = x_1 + \sigma_{min} x_0 \approx x_1$, mean and variance aligned with BC conditional vector field: $u_1(x \mid x_1)=u(x_1\mid x_1)=\frac{(\sigma_{min} x_1)}{\sigma_{min}} =x_1$, 好像有點怪怪的

假設 $t=1-\Delta t$ $\begin{aligned} u_{1-\Delta t}(x\mid x_1) &= u(x_{1-\Delta t}\mid x_1) = \frac{x_1 - (1-\sigma_{min}) x_{1-\Delta t}}{1-(1-\sigma_{min})(1-\Delta t)}\\ &\approx\frac{x_1 - x_{1-\Delta t} + \sigma_{min} x_{1-\Delta t}}{\Delta t +\sigma_{min}}\\ \end{aligned}$ 所以在 $\Delta t$ 比較大的時候，$u_{1-\Delta t} \approx \frac{d x_1}{d t}$, 還是 flow 在 dominate.
但等到 $\Delta t$ 接近無窮小，$u_{1-\Delta t} \approx x_1$, 就是指到 $x_1$

$t = 0$ BC (boundary condition): $\mu_0(x_1) = 0$, $\sigma_0(x_1)=1$ conditional flow: $\psi_0(x_0) = x_0 \sim N(0, I)$, aligned with boundary condition conditional vector field: $u_0(x \mid x_1)=u(x_0\mid x_1)=x_1-(1-\sigma_{min})x_0\approx x_1 - x_0$

這個部分的結果和之前抵觸！！$u_0(x \mid x_1)=x_0$ 因爲 $x_0$ 和 $x_1$ 完全不相關！但是在 OT 的情況變成完全相關！

OT 的物理意義是 $v_t(x_t)$ (neural network vector field) 在任何時間的 vector field 就是 $x_1-x_0$，assuming $\sigma_{min} \approx 0$. 非常簡單到不像話！！

![[Pasted image 20250514145237.png]]

Reference

MIT 6.S184: Flow Matching and Diffusion Models https://www.youtube.com/watch?v=GCoP2w-Cqtg&t=28s&ab_channel=PeterHolderrieth

Yaron Meta paper: [2210.02747] Flow Matching for Generative Modeling

An Introduction to Flow Matching: https://mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html

Appendix C

Compute Expectation** (Appendix C)

The conditional vector field should be: ![[Pasted image 20250603200549.png]]

Since $u_t(x \mid x_1) = \frac{x_1 - x}{1-t}$ is linear in $x_1$, and the posterior is Gaussian: $u_t(x) = \mathbb{E}_{x_1 \sim p_{1\mid t}(x_1|x)}\left[\frac{x_1 - x}{1-t} \right] = \frac{\overbrace{\mathbb{E}_{p_{1\mid t}}[x_1 ]}^{\text{posterior mean}} - x}{1-t}$ $\boxed{\mathbf{u}_t(\mathbf{x}) = \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{2t^2-2t+1} = \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{\sigma^2_t}}$

Geometric Interpretation

$\frac{d\boldsymbol{x}_t}{dt} = \boldsymbol{u}_t(\boldsymbol{x_t})$

For $t < 0.5$:
The term $(2t-1)$ is negative, 所以是一個壓縮的流。
For $t > 0.5$:
The term $(2t-1)$ is positive, 所以是一個膨脹的流。

![[Pasted image 20250604104124.png]]

The posterior mean is: $\mathbb{E}[x_1 \mid x_t = x] = \frac{tx + (1-t)\boldsymbol{\mu}}{2t^2-2t+1}$

Step 3: Substitute and Simplify

$u_t(x) = \frac{1}{1-t} \left( \frac{tx + (1-t)\boldsymbol{\mu}}{2t^2-2t+1} - x \right)$

\[= \frac{1}{1-t} \left( \frac{t x + (1-t)\boldsymbol{\mu} - x(2t^2-2t+1)}{2t^2-2t+1} \right)\] \[= \frac{1}{1-t} \cdot \frac{(t - 2t^2 + 2t - 1)x + (1-t)\boldsymbol{\mu}}{2t^2-2t+1}\]

$= \frac{1}{1-t} \cdot \frac{(3t - 2t^2 - 1)x + (1-t)\boldsymbol{\mu}}{2t^2-2t+1}$ $\boxed{\mathbf{u}_t(\mathbf{x}) = \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{2t^2-2t+1}}$

Verification.

At $t = 0$: $\mathbf{u}_0(\mathbf{x}) = \frac{(-1)\mathbf{x} + \boldsymbol{\mu}}{1} = -\mathbf{x}+\boldsymbol{\mu}$
- At prior mean $\mathbf{x} = -\boldsymbol{\mu}$: $\mathbf{u}_0(-\boldsymbol{\mu}) = -(-\boldsymbol{\mu})+\boldsymbol{\mu} = 2\boldsymbol{\mu}$
  (Points toward $+2\boldsymbol{\mu}$, correct)
At $t = 1$: $\mathbf{u}_1(\mathbf{x}) = \frac{(2-1)\mathbf{x} + \boldsymbol{\mu}}{2-2+1} = \mathbf{x} + \boldsymbol{\mu}$
- At target mean $\mathbf{x} = \boldsymbol{\mu}$: $\mathbf{u}_1(\boldsymbol{\mu}) = \boldsymbol{\mu} + \boldsymbol{\mu} = 2\boldsymbol{\mu}$
  (Consistent with linear interpolation)

Implementation

import numpy as np

def vector_field(x: np.ndarray, t: float, mu: np.ndarray) -> np.ndarray:
    """
    Computes CFM vector field for p0 = N(-μ, I) → p1 = N(+μ, I).
    
    Args:
        x: Current position (n-dimensional vector)
        t: Time in [0, 1]
        mu: Target mean vector (+μ)
    
    Returns:
        u_t(x): Vector field direction
    """
    numerator = (2*t - 1) * x + mu
    denominator = 2*t**2 - 2*t + 1
    return numerator / (denominator + 1e-8)  # Avoid division by zero

其實這也是 Schrödinger Bridge.

![[Pasted image 20250602232444.png]]

One-sided flow matching? Two sided flow matching?

Appendix D

To derive the conditional expectation $\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}]$ for the linear interpolation path $\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$, where $\mathbf{x}_0 \sim \mathcal{N}(-\boldsymbol{\mu}, \mathbf{I})$ and $\mathbf{x}_1 \sim \mathcal{N}(\boldsymbol{\mu}, \mathbf{I})$ are independent, follow these steps:

Step 1: Define Joint Distribution

The vector $\begin{bmatrix} \mathbf{x}_0 \ \mathbf{x}_t \end{bmatrix}$ is jointly Gaussian since $\mathbf{x}_t$ is a linear combination of $\mathbf{x}_0$ and $\mathbf{x}_1$. Compute its moments:

Means:
$\mathbb{E}[\mathbf{x}_0] = -\boldsymbol{\mu}, \quad \mathbb{E}[\mathbf{x}_t] = (1-t)(-\boldsymbol{\mu}) + t(\boldsymbol{\mu}) = (2t-1)\boldsymbol{\mu}.$
Covariances:
$\text{Cov}(\mathbf{x}_0) = \mathbf{I}, \quad \text{Cov}(\mathbf{x}_t) = (1-t)^2\mathbf{I} + t^2\mathbf{I} = \sigma_t^2 \mathbf{I}, \quad \sigma_t^2 = 2t^2 - 2t + 1.$
Cross-Covariance:
$\text{Cov}(\mathbf{x}_0, \mathbf{x}_t) = \mathbb{E}[(\mathbf{x}_0 + \boldsymbol{\mu})(\mathbf{x}_t - (2t-1)\boldsymbol{\mu})^\top] = (1-t)\mathbf{I},$
since $\mathbf{x}_1 - \boldsymbol{\mu}$ is independent of $\mathbf{x}_0 + \boldsymbol{\mu}$ and has zero mean.

Step 2: Apply Gaussian Conditioning Formula

Step 3: Substitute and Simplify

Final Result

$\boxed{\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] = -\boldsymbol{\mu} + \dfrac{1-t}{\sigma_t^{2}} \left( \mathbf{x} - (2t-1)\boldsymbol{\mu} \right)}$

Intuition

Appendix E

The vector field is given by: $\mathbf{u}_t(\mathbf{x}) = \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{2t^2 - 2t + 1}, \quad \boldsymbol{\mu} = [10, 0]^\top$

This vector field defines the marginal flow (not conditional paths). To determine whether samples follow straight lines from $\mathbf{x}_0$ to $\mathbf{x}_1$, we analyze the trajectory under this field.

Key Insight:

The marginal vector field $\mathbf{u}_t(\mathbf{x})$ is derived from the independent coupling of $p_0$ and $p_1$, where $\mathbf{x}_0 \sim \mathcal{N}(-\boldsymbol{\mu}, \mathbf{I})$ and $\mathbf{x}_1 \sim \mathcal{N}(+\boldsymbol{\mu}, \mathbf{I})$ are sampled independently.
For a fixed $\mathbf{x}_0$, the endpoint $\mathbf{x}_1$ is not unique (since $\mathbf{x}_1$ is random and independent of $\mathbf{x}_0$).
The flow under $\mathbf{u}_t(\mathbf{x})$ transports $\mathbf{x}_0$ to $\mathbf{x}_0 + 2\boldsymbol{\mu}$ (a sample from $p_1$), but the path is not straight in general.

Trajectory Analysis:

The exact solution of the ODE $\frac{d\mathbf{x}_t}{dt} = \mathbf{u}_t(\mathbf{x}_t)$ with initial condition $\mathbf{x}_0$ is: $\mathbf{x}_t = (2t-1)\boldsymbol{\mu} + \sqrt{2t^2 - 2t + 1} \cdot (\mathbf{x}_0 + \boldsymbol{\mu})$

At $t=0$: $\mathbf{x}_0 = -\boldsymbol{\mu} + \mathbf{z}$ (where $\mathbf{z} = \mathbf{x}_0 + \boldsymbol{\mu} \sim \mathcal{N}(0, \mathbf{I})$).
At $t=1$: $\mathbf{x}_1 = \boldsymbol{\mu} + \mathbf{z} = \mathbf{x}_0 + 2\boldsymbol{\mu}$ (a sample from $p_1$).

The path is curved due to the $\sqrt{2t^2 - 2t + 1}$ term, which is nonlinear in $t$.

Example: $\mathbf{x}_0 = [-10, 1]^\top$

Target: $\mathbf{x}_1 = \mathbf{x}_0 + 2\boldsymbol{\mu} = [10, 1]^\top$.
Trajectory:
$\mathbf{x}_t = \begin{bmatrix} 20t - 10 \\ \sqrt{2t^2 - 2t + 1} \end{bmatrix}$
Positions:
- $t=0$: $[-10, 1]^\top$
- $t=0.5$: $[0, \sqrt{0.5}]^\top \approx [0, 0.707]^\top$
- $t=1$: $[10, 1]^\top$.

The $y$-component dips to $\approx 0.707$ at $t=0.5$ (not a straight line to $[10, 1]^\top$):

Why Straight Lines Do Not Occur:

Independent coupling:
- For a fixed pair $(\mathbf{x}_0, \mathbf{x}_1)$, the conditional path is straight: $\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$.
- However, $\mathbf{u}_t(\mathbf{x})$ is the marginal field (average over all $\mathbf{x}_1$), so individual paths curve to reconcile all possible endpoints.
Geometry:
- Straight lines require $\frac{d^2\mathbf{x}_t}{dt^2} = 0$. Here, acceleration is nonzero:
  $\frac{d\mathbf{u}_t}{dt} \neq 0 \implies \text{curved paths}.$

Conclusion:

Wrong. Under the given marginal vector field $\mathbf{u}_t(\mathbf{x})$:

Samples from $p_0$ travel to $\mathbf{x}_0 + 2\boldsymbol{\mu}$ (a valid sample from $p_1$).
The trajectory is not a straight line unless $\mathbf{x}_0 = -\boldsymbol{\mu}$ (mean of $p_0$).
Curved paths arise from the independent coupling, where the vector field averages over all possible $\mathbf{x}_1$.

Key Takeaway: The marginal flow matches the distributions $p_0 \to p_1$ but follows curved trajectories. For straight lines, use conditional flow matching (Lipman et al.) with paired samples $(\mathbf{x}_0, \mathbf{x}_1)$.

Appendix F

To compute the total derivative of $\log p_t(\mathbf{x})$ along the probability flow defined by the vector field $\mathbf{u}_t(\mathbf{x})$, we use the formula:

\[\frac{d}{dt} \log p_t(\mathbf{x}) = \frac{\partial}{\partial t} \log p_t(\mathbf{x}) + \mathbf{u}_t(\mathbf{x}) \cdot \nabla_{\mathbf{x}} \log p_t(\mathbf{x})\]

where:

$p_t(\mathbf{x}) = \mathcal{N}(\mathbf{x}; (2t-1)\boldsymbol{\mu}, \sigma_t^2 \mathbf{I})$,
$\sigma_t^2 = 2t^2 - 2t + 1$,
$\mathbf{u}_t(\mathbf{x}) = \dfrac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{\sigma_t^2}$,
$\boldsymbol{\mu} = [10, 0]$, and the dimension $d = 2$.

Step 1: Compute $\nabla_{\mathbf{x}} \log p_t(\mathbf{x})$

The log-density is: $\log p_t(\mathbf{x}) = -\frac{d}{2} \log(2\pi) - \frac{d}{2} \log(\sigma_t^2) - \frac{1}{2\sigma_t^2} \|\mathbf{x} - (2t-1)\boldsymbol{\mu}\|^2$ The gradient with respect to $\mathbf{x}$ is: $\nabla_{\mathbf{x}} \log p_t(\mathbf{x}) = -\frac{1}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu})$

Step 2: Compute $\frac{\partial}{\partial t} \log p_t(\mathbf{x})$

Differentiate $\log p_t(\mathbf{x})$ with respect to $t$, treating $\mathbf{x}$ as fixed: $\frac{\partial}{\partial t} \log p_t(\mathbf{x}) = -\frac{d}{2} \frac{1}{\sigma_t^2} \frac{\partial \sigma_t^2}{\partial t} - \frac{\partial}{\partial t} \left( \frac{1}{2\sigma_t^2} \|\mathbf{x} - (2t-1)\boldsymbol{\mu}\|^2 \right)$ where $\frac{\partial \sigma_t^2}{\partial t} = 4t - 2$. After simplification: $\frac{\partial}{\partial t} \log p_t(\mathbf{x}) = -d \frac{2t-1}{\sigma_t^2} + \frac{2t-1}{\sigma_t^4} \|\mathbf{x} - (2t-1)\boldsymbol{\mu}\|^2 + \frac{2}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu}) \cdot \boldsymbol{\mu}$

Step 3: Compute $\mathbf{u}t(\mathbf{x}) \cdot \nabla{\mathbf{x}} \log p_t(\mathbf{x})$

Substitute the expressions: $\mathbf{u}_t(\mathbf{x}) \cdot \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) = \left( \frac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{\sigma_t^2} \right) \cdot \left( -\frac{1}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu}) \right)$ Simplify to: $\mathbf{u}_t(\mathbf{x}) \cdot \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) = -\frac{1}{\sigma_t^4} \left[ (2t-1)\mathbf{x} + \boldsymbol{\mu} \right] \cdot \left[ \mathbf{x} - (2t-1)\boldsymbol{\mu} \right]$

Step 4: Sum the terms

Combine both parts: $\frac{d}{dt} \log p_t(\mathbf{x}) = \left[ -d \frac{2t-1}{\sigma_t^2} + \frac{2t-1}{\sigma_t^4} \|\mathbf{x} - (2t-1)\boldsymbol{\mu}\|^2 + \frac{2}{\sigma_t^2} (\mathbf{x} - (2t-1)\boldsymbol{\mu}) \cdot \boldsymbol{\mu} \right] + \left[ -\frac{1}{\sigma_t^4} \left[ (2t-1)\mathbf{x} + \boldsymbol{\mu} \right] \cdot \left[ \mathbf{x} - (2t-1)\boldsymbol{\mu} \right] \right]$ After algebraic simplification (where all $\mathbf{x}$-dependent terms cancel), the result is: $\frac{d}{dt} \log p_t(\mathbf{x}) = -d \frac{2t-1}{\sigma_t^2}$

Final Result

For $d = 2$ and $\sigma_t^2 = 2t^2 - 2t + 1$: $\boxed{\dfrac{d}{dt} \log p_{t}(\mathbf{x}) = -2 \cdot \dfrac{2t - 1}{2t^{2} - 2t + 1}}$

Verification via Continuity Equation

The continuity equation requires: $\frac{d}{dt} \log p_t(\mathbf{x}) = - \nabla \cdot \mathbf{u}_t(\mathbf{x})$ Compute the divergence: $\nabla \cdot \mathbf{u}_t(\mathbf{x}) = \nabla \cdot \left( \dfrac{(2t-1)\mathbf{x} + \boldsymbol{\mu}}{\sigma_t^2} \right) = \frac{(2t-1)}{\sigma_t^2} \nabla \cdot \mathbf{x} = \frac{(2t-1) \cdot d}{\sigma_t^2}$ Thus: $$

\nabla \cdot \mathbf{u}_t(\mathbf{x}) = -d \frac{2t-1}{\sigma_t^2} $$ which matches the result above, confirming correctness. The total derivative is independent of $\mathbf{x}$, a special property of this Gaussian flow.

Appendix E

The form $\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{\dot{\sigma}_t^{2}}{2\sigma_t^{2}} (\mathbf{x} - \boldsymbol{\mu}_t)$ holds generally for isotropic Gaussians $p_0 = \mathcal{N}(\boldsymbol{\mu}_0, \sigma_0^2 \mathbf{I})$ and $p_1 = \mathcal{N}(\boldsymbol{\mu}_1, \sigma_1^2 \mathbf{I})$ under independent coupling. Here’s the derivation and verification:

Step 1: Marginal Distribution at Time $t$

Under independent coupling:

$\mathbf{x}_0 \sim \mathcal{N}(\boldsymbol{\mu}_0, \sigma_0^2 \mathbf{I})$
$\mathbf{x}_1 \sim \mathcal{N}(\boldsymbol{\mu}_1, \sigma_1^2 \mathbf{I})$

With linear interpolation $\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$, the marginal is Gaussian: $\mathbf{x}_t \sim \mathcal{N}(\boldsymbol{\mu}_t, \sigma_t^2 \mathbf{I}),$ where:

Mean: $\boldsymbol{\mu}_t = (1-t)\boldsymbol{\mu}_0 + t\boldsymbol{\mu}_1$
Variance: $\sigma_t^2 = (1-t)^2 \sigma_0^2 + t^2 \sigma_1^2$

Step 2: Conditional Expectations

Given $\mathbf{x}_t = \mathbf{x}$, the posterior expectations are: $\mathbb{E}[\mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] = \boldsymbol{\mu}_0 + \frac{(1-t)\sigma_0^2}{\sigma_t^2} (\mathbf{x} - \boldsymbol{\mu}_t)$ $\mathbb{E}[\mathbf{x}_1 \mid \mathbf{x}_t = \mathbf{x}] = \boldsymbol{\mu}_1 + \frac{t\sigma_1^2}{\sigma_t^2} (\mathbf{x} - \boldsymbol{\mu}_t)$

Step 3: Marginal Vector Field

$\mathbf{u}_t(\mathbf{x}) = \mathbb{E}[\mathbf{x}_1 - \mathbf{x}_0 \mid \mathbf{x}_t = \mathbf{x}] = \left[ \boldsymbol{\mu}_1 + \frac{t\sigma_1^2}{\sigma_t^2} (\mathbf{x} - \boldsymbol{\mu}_t) \right] - \left[ \boldsymbol{\mu}_0 + \frac{(1-t)\sigma_0^2}{\sigma_t^2} (\mathbf{x} - \boldsymbol{\mu}_t) \right]$ Simplify: $\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{t\sigma_1^2 - (1-t)\sigma_0^2}{\sigma_t^2} (\mathbf{x} - \boldsymbol{\mu}_t)$

Step 4: Relate to $\dot{\sigma}_t^2$

Compute the derivative of $\sigma_t^2$: $\dot{\sigma}_t^2 = \frac{d}{dt} \left[ (1-t)^2 \sigma_0^2 + t^2 \sigma_1^2 \right] = -2(1-t)\sigma_0^2 + 2t\sigma_1^2 = 2 \left[ t\sigma_1^2 - (1-t)\sigma_0^2 \right]$ Thus: $t\sigma_1^2 - (1-t)\sigma_0^2 = \frac{\dot{\sigma}_t^2}{2}$ Substitute into $\mathbf{u}_t(\mathbf{x})$: $\boxed{\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \dfrac{\dot{\sigma}_t^{2}}{2\sigma_t^{2}} (\mathbf{x} - \boldsymbol{\mu}_t)}$

Verification at Boundaries

At $t = 0$:
- $\sigma_t^2 = \sigma_0^2$, $\dot{\sigma}_t^2 = -2\sigma_0^2$, $\boldsymbol{\mu}_t = \boldsymbol{\mu}_0$
- $\mathbf{u}_0(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{-2\sigma_0^2}{2\sigma_0^2} (\mathbf{x} - \boldsymbol{\mu}_0) = \boldsymbol{\mu}_1 - \mathbf{x}$
- Matches $\mathbb{E}[\mathbf{x}_1 - \mathbf{x}_0 \mid \mathbf{x}_0 = \mathbf{x}] = \boldsymbol{\mu}_1 - \mathbf{x}$.
At $t = 1$:
- $\sigma_t^2 = \sigma_1^2$, $\dot{\sigma}_t^2 = 2\sigma_1^2$, $\boldsymbol{\mu}_t = \boldsymbol{\mu}_1$
- $\mathbf{u}_1(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{2\sigma_1^2}{2\sigma_1^2} (\mathbf{x} - \boldsymbol{\mu}_1) = \mathbf{x} - \boldsymbol{\mu}_0$
- Matches $\mathbb{E}[\mathbf{x}_1 - \mathbf{x}_0 \mid \mathbf{x}_1 = \mathbf{x}] = \mathbf{x} - \boldsymbol{\mu}_0$.

Key Observations

Generalization: The form holds for arbitrary $\sigma_0^2, \sigma_1^2 > 0$, reducing to the unit-variance case when $\sigma_0^2 = \sigma_1^2 = 1$.
Role of $\dot{\sigma}_t^2$: The term $\frac{\dot{\sigma}_t^2}{2\sigma_t^2}$ captures the time-dependent scaling of the drift relative to the current dispersion $\sigma_t^2$.
Interpretation: The vector field transports mass from $p_0$ to $p_1$ by:
- A constant velocity $(\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0)$ (mean shift),
- A position-dependent correction that contracts/expands dispersion based on $\dot{\sigma}_t^2$.

This result is consistent with probability flow ODEs in diffusion models and holds for any isotropic Gaussians under independent coupling.

Appendix F

为了求解常微分方程（ODE）： $\frac{d\mathbf{x}}{dt} = \mathbf{u}_t(\mathbf{x})$ 其中向量场 $\mathbf{u}_t(\mathbf{x})$ 在独立耦合下定义为： $\mathbf{u}_t(\mathbf{x}) = \dot{\boldsymbol{\mu}}_t + \frac{\dot{\sigma}_t^{2}}{2\sigma_t^{2}} (\mathbf{x} - \boldsymbol{\mu}_t)$ 且初始条件为 $\mathbf{x}(0) = \mathbf{x}_0$。这里，$\boldsymbol{\mu}_t = (1-t)\boldsymbol{\mu}_0 + t\boldsymbol{\mu}_1$，$\sigma_t^2 = (1-t)^2 \sigma_0^2 + t^2 \sigma_1^2$，$\dot{\sigma}_t^2 = \frac{d}{dt}\sigma_t^2 = -2(1-t)\sigma_0^2 + 2t\sigma_1^2$。

推导过程

变量变换：
令 $\mathbf{y} = \mathbf{x} - \boldsymbol{\mu}_t$。则： $\frac{d\mathbf{y}}{dt} = \frac{d\mathbf{x}}{dt} - \frac{d\boldsymbol{\mu}_t}{dt}$ 计算 $\frac{d\boldsymbol{\mu}_t}{dt}$： $\frac{d\boldsymbol{\mu}_t}{dt} = -\boldsymbol{\mu}_0 + \boldsymbol{\mu}_1 = \boldsymbol{\mu}_1 - \boldsymbol{\mu}_0$ 代入 ODE： $\frac{d\mathbf{y}}{dt} = \left[ (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{\dot{\sigma}_t^{2}}{2\sigma_t^{2}} \mathbf{y} \right] - (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) = \frac{\dot{\sigma}_t^{2}}{2\sigma_t^{2}} \mathbf{y}$ 得到简化方程： $\frac{d\mathbf{y}}{dt} = \frac{\dot{\sigma}_t^{2}}{2\sigma_t^{2}} \mathbf{y}$
求解简化 ODE：
该方程为可分离变量： $\frac{d\mathbf{y}}{\mathbf{y}} = \frac{\dot{\sigma}_t^{2}}{2\sigma_t^{2}} dt$ 其中 $\frac{\dot{\sigma}_t^{2}}{2\sigma_t^{2}} = \frac{1}{2} \frac{d}{dt} \ln \sigma_t^2$。积分两边： $\int \frac{d\mathbf{y}}{\mathbf{y}} = \frac{1}{2} \int d(\ln \sigma_t^2)$ 得： $\ln |\mathbf{y}| = \frac{1}{2} \ln \sigma_t^2 + C$ 其中 $C$ 为积分常数。解出 $\mathbf{y}$: $\mathbf{y}(t) = \mathbf{y}(0) \exp\left( \frac{1}{2} \ln \frac{\sigma_t^2}{\sigma_0^2} \right) = \mathbf{y}(0) \left( \frac{\sigma_t^2}{\sigma_0^2} \right)^{1/2} = \mathbf{y}(0) \frac{\sigma_t}{\sigma_0}$ 这里 $\sigma_t = \sqrt{\sigma_t^2}$，$\sigma_0 = \sqrt{\sigma_0^2}$ 为标准差。
初始条件代入：
在 $t = 0$ 时，$\mathbf{y}(0) = \mathbf{x}(0) - \boldsymbol{\mu}_t(0) = \mathbf{x}_0 - \boldsymbol{\mu}_0$。因此： $\mathbf{y}(t) = (\mathbf{x}_0 - \boldsymbol{\mu}_0) \frac{\sigma_t}{\sigma_0}$
还原变量：
由 $\mathbf{y}(t) = \mathbf{x}(t) - \boldsymbol{\mu}_t$，得： $\mathbf{x}(t) = \boldsymbol{\mu}_t + (\mathbf{x}_0 - \boldsymbol{\mu}_0) \frac{\sigma_t}{\sigma_0}$ 代入 $\boldsymbol{\mu}_t = (1-t)\boldsymbol{\mu}_0 + t\boldsymbol{\mu}_1$ 和 $\sigma_t = \sqrt{(1-t)^2 \sigma_0^2 + t^2 \sigma_1^2}$，最终解为： $\mathbf{x}(t) = \left[(1-t) \boldsymbol{\mu}_0 + t \boldsymbol{\mu}_1\right] + \frac{\sqrt{(1-t)^2 \sigma_0^2 + t^2 \sigma_1^2}}{\sigma_0} (\mathbf{x}_0 - \boldsymbol{\mu}_0)$

验证

初始条件 $t = 0$：
$\boldsymbol{\mu}_t = \boldsymbol{\mu}_0$，$\sigma_t = \sigma_0$，
$\mathbf{x}(0) = \boldsymbol{\mu}_0 + \frac{\sigma_0}{\sigma_0} (\mathbf{x}_0 - \boldsymbol{\mu}_0) = \mathbf{x}_0$，满足初始条件。
分布验证：
若 $\mathbf{x}_0 \sim \mathcal{N}(\boldsymbol{\mu}_0, \sigma_0^2 \mathbf{I})$，则 $\mathbf{x}(t) \sim \mathcal{N}(\boldsymbol{\mu}_t, \sigma_t^2 \mathbf{I})$，符合流匹配的边际分布要求。

最终解

$\boxed{\mathbf{x}(t) = \left[(1-t) \boldsymbol{\mu}_0 + t \boldsymbol{\mu}_1\right] + \dfrac{\sqrt{(1-t)^2 \sigma_0^2 + t^2 \sigma_1^2}}{\sigma_0} (\mathbf{x}_0 - \boldsymbol{\mu}_0)}$

Appendix G

To compute the marginal vector field $\mathbf{u}_t(\mathbf{x})$ for flow matching between two Gaussians $p_0 = \mathcal{N}(\boldsymbol{\mu}_0, \Sigma_0)$ and $p_1 = \mathcal{N}(\boldsymbol{\mu}_1, \Sigma_1)$ under independent coupling (cross-covariance = 0), we start from the conditional vector field $\mathbf{u}_t(\mathbf{x} \mid \mathbf{x}_1)$ and derive the marginal field through expectation. The conditional field is derived from the straight-line path $\mathbf{x}_t = (1-t)\mathbf{x}_0 + t\mathbf{x}_1$:

Step 1: Conditional Vector Field

The time derivative of the path gives the conditional vector field: $\mathbf{u}_t(\mathbf{x} \mid \mathbf{x}_1) = \frac{d\mathbf{x}_t}{dt} = \mathbf{x}_1 - \mathbf{x}_0$ Expressing $\mathbf{x}_0$ in terms of $\mathbf{x}_t$ and $\mathbf{x}_1$: $\mathbf{x}_0 = \frac{\mathbf{x}_t - t\mathbf{x}_1}{1-t}$ Substitute to eliminate $\mathbf{x}_0$: $\mathbf{u}_t(\mathbf{x} \mid \mathbf{x}_1) = \mathbf{x}_1 - \frac{\mathbf{x} - t\mathbf{x}_1}{1-t} = \frac{\mathbf{x}_1 - \mathbf{x}}{1-t}$

Step 2: Marginal Vector Field

The marginal vector field is the expectation over $\mathbf{x}_1$ conditioned on $\mathbf{x}$: $\mathbf{u}_t(\mathbf{x}) = \mathbb{E}_{p_t(\mathbf{x}_1 \mid \mathbf{x})} \left[ \mathbf{u}_t(\mathbf{x} \mid \mathbf{x}_1) \right] = \mathbb{E}_{p_t(\mathbf{x}_1 \mid \mathbf{x})} \left[ \frac{\mathbf{x}_1 - \mathbf{x}}{1-t} \right] = \frac{1}{1-t} \left( \mathbb{E}[\mathbf{x}_1 \mid \mathbf{x}] - \mathbf{x} \right)$

Step 3: Compute $\mathbb{E}[\mathbf{x}_1 \mid \mathbf{x}]$

Under independent coupling, the joint distribution of $\mathbf{x}_t$ and $\mathbf{x}_1$ is Gaussian: $\begin{pmatrix} \mathbf{x}_t \\ \mathbf{x}_1 \end{pmatrix} \sim \mathcal{N} \left( \begin{pmatrix} \boldsymbol{\mu}_t \\ \boldsymbol{\mu}_1 \end{pmatrix}, \begin{pmatrix} \Sigma_t & \text{Cov}(\mathbf{x}_t, \mathbf{x}_1) \\ \text{Cov}(\mathbf{x}_1, \mathbf{x}_t) & \Sigma_1 \end{pmatrix} \right)$ where:

$\boldsymbol{\mu}_t = (1-t)\boldsymbol{\mu}_0 + t\boldsymbol{\mu}_1$
$\Sigma_t = (1-t)^2 \Sigma_0 + t^2 \Sigma_1$ (independent coupling)
$\text{Cov}(\mathbf{x}_t, \mathbf{x}_1) = t\Sigma_1$ (since $\mathbf{x}_0$ and $\mathbf{x}_1$ are independent)

The conditional expectation is: $\mathbb{E}[\mathbf{x}_1 \mid \mathbf{x}] = \boldsymbol{\mu}_1 + \text{Cov}(\mathbf{x}_1, \mathbf{x}_t) \Sigma_t^{-1} (\mathbf{x} - \boldsymbol{\mu}_t) = \boldsymbol{\mu}_1 + t\Sigma_1 \Sigma_t^{-1} (\mathbf{x} - \boldsymbol{\mu}_t)$

Step 4: Substitute into $\mathbf{u}_t(\mathbf{x})$

$\mathbf{u}_t(\mathbf{x}) = \frac{1}{1-t} \left( \boldsymbol{\mu}_1 + t\Sigma_1 \Sigma_t^{-1} (\mathbf{x} - \boldsymbol{\mu}_t) - \mathbf{x} \right)$ Rewrite $\boldsymbol{\mu}_1 - \mathbf{x}$ as: $\boldsymbol{\mu}_1 - \mathbf{x} = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_t) + (\boldsymbol{\mu}_t - \mathbf{x}) = (1-t)(\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + (\boldsymbol{\mu}_t - \mathbf{x})$ Substitute: $\mathbf{u}_t(\mathbf{x}) = \frac{1}{1-t} \left( (1-t)(\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + (\boldsymbol{\mu}_t - \mathbf{x}) + t\Sigma_1 \Sigma_t^{-1} (\mathbf{x} - \boldsymbol{\mu}_t) \right)$ Simplify: $\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{1}{1-t} \left( -(\mathbf{x} - \boldsymbol{\mu}_t) + t\Sigma_1 \Sigma_t^{-1} (\mathbf{x} - \boldsymbol{\mu}_t) \right)$ Factor: $\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{ -I + t\Sigma_1 \Sigma_t^{-1} }{1-t} (\mathbf{x} - \boldsymbol{\mu}_t)$

Step 5: Verify Consistency

Using the continuity equation for the Gaussian path $p_t = \mathcal{N}(\boldsymbol{\mu}_t, \Sigma_t)$: $\mathbf{u}_t(\mathbf{x}) = \dot{\boldsymbol{\mu}}_t + \frac{1}{2} \dot{\Sigma}_t \Sigma_t^{-1} (\mathbf{x} - \boldsymbol{\mu}_t)$ where:

$\dot{\boldsymbol{\mu}}_t = \boldsymbol{\mu}_1 - \boldsymbol{\mu}_0$
$\dot{\Sigma}_t = -2(1-t)\Sigma_0 + 2t\Sigma_1$

Substitute: $\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{1}{2} \left( -2(1-t)\Sigma_0 + 2t\Sigma_1 \right) \Sigma_t^{-1} (\mathbf{x} - \boldsymbol{\mu}_t) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \left( t\Sigma_1 - (1-t)\Sigma_0 \right) \Sigma_t^{-1} (\mathbf{x} - \boldsymbol{\mu}_t)$ This matches the expression from Step 4 since: $t\Sigma_1 - (1-t)\Sigma_0 = \frac{ -I + t\Sigma_1 \Sigma_t^{-1} }{1-t} \cdot \Sigma_t$

Final Result

The marginal vector field for independent coupling is: $\boxed{\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \dfrac{ t \Sigma_1 - (1-t) \Sigma_0 }{ \Sigma_t } (\mathbf{x} - \boldsymbol{\mu}_t)}$ where $\Sigma_t = (1-t)^2 \Sigma_0 + t^2 \Sigma_1$.

Special Case: Isotropic Gaussians

If $\Sigma_0 = \sigma_0^2 \mathbf{I}$ and $\Sigma_1 = \sigma_1^2 \mathbf{I}$: $\Sigma_t = \sigma_t^2 \mathbf{I}, \quad \sigma_t^2 = (1-t)^2 \sigma_0^2 + t^2 \sigma_1^2$ $\dot{\sigma}_t^2 = -2(1-t)\sigma_0^2 + 2t\sigma_1^2$ The vector field simplifies to: $\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \frac{\dot{\sigma}_t^2}{2\sigma_t^2} (\mathbf{x} - \boldsymbol{\mu}_t)$

Appendix H: General Gaussian Flow

To determine the trajectory $\mathbf{x}(t)$ for the flow defined by the vector field $\mathbf{u}_t(\mathbf{x})$ under independent coupling (cross-covariance = 0) between Gaussians $p_0 = \mathcal{N}(\boldsymbol{\mu}_0, \Sigma_0)$ and $p_1 = \mathcal{N}(\boldsymbol{\mu}_1, \Sigma_1)$, we start from the given vector field and solve the associated ordinary differential equation (ODE). The vector field is:

\[\mathbf{u}_t(\mathbf{x}) = (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \left( t \Sigma_1 - (1-t) \Sigma_0 \right) \Sigma_t^{-1} (\mathbf{x} - \boldsymbol{\mu}_t),\]

where:

$\boldsymbol{\mu}_t = (1-t) \boldsymbol{\mu}_0 + t \boldsymbol{\mu}_1$,
$\Sigma_t = (1-t)^2 \Sigma_0 + t^2 \Sigma_1$.

The trajectory $\mathbf{x}(t)$ satisfies the ODE: $\frac{d\mathbf{x}}{dt} = \mathbf{u}_t(\mathbf{x}), \quad \mathbf{x}(0) = \mathbf{x}_0.$

This is a linear, non-autonomous ODE. To solve it, we decompose $\mathbf{x}(t)$ into its mean and deviation components. Define: $\mathbf{y}(t) = \mathbf{x}(t) - \boldsymbol{\mu}_t,$ where $\mathbf{y}(t)$ represents the deviation from the time-dependent mean $\boldsymbol{\mu}_t$. The initial condition is $\mathbf{y}(0) = \mathbf{x}_0 - \boldsymbol{\mu}_0$.

Step 1: Derive the ODE for $\mathbf{y}(t)$

Differentiate $\mathbf{y}(t)$: $\frac{d\mathbf{y}}{dt} = \frac{d\mathbf{x}}{dt} - \dot{\boldsymbol{\mu}}_t,$ where $\dot{\boldsymbol{\mu}}_t = \frac{d\boldsymbol{\mu}_t}{dt} = \boldsymbol{\mu}_1 - \boldsymbol{\mu}_0$. Substitute the ODE for $\frac{d\mathbf{x}}{dt}$: $\frac{d\mathbf{y}}{dt} = \mathbf{u}_t(\mathbf{x}) - \dot{\boldsymbol{\mu}}_t = \left[ (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) + \left( t \Sigma_1 - (1-t) \Sigma_0 \right) \Sigma_t^{-1} \mathbf{y} \right] - (\boldsymbol{\mu}_1 - \boldsymbol{\mu}_0) = \left( t \Sigma_1 - (1-t) \Sigma_0 \right) \Sigma_t^{-1} \mathbf{y}.$

Simplify the coefficient: $\mathbf{B}_t = \left( t \Sigma_1 - (1-t) \Sigma_0 \right) \Sigma_t^{-1}.$ Thus, the ODE for $\mathbf{y}(t)$ is: $\frac{d\mathbf{y}}{dt} = \mathbf{B}_t \mathbf{y}, \quad \mathbf{y}(0) = \mathbf{x}_0 - \boldsymbol{\mu}_0.$

Step 2: Solve the ODE for $\mathbf{y}(t)$

The solution to $\frac{d\mathbf{y}}{dt} = \mathbf{B}_t \mathbf{y}$ is: $\mathbf{y}(t) = \mathbf{C}(t) \mathbf{y}(0),$ where $\mathbf{C}(t)$ is the fundamental matrix satisfying: $\frac{d\mathbf{C}}{dt} = \mathbf{B}_t \mathbf{C}, \quad \mathbf{C}(0) = \mathbf{I}.$

Step 3: Express $\mathbf{x}(t)$

Substitute back to $\mathbf{x}(t)$: $\mathbf{x}(t) = \boldsymbol{\mu}_t + \mathbf{y}(t) = \boldsymbol{\mu}_t + \mathbf{C}(t) (\mathbf{x}_0 - \boldsymbol{\mu}_0).$

Step 4: Closed-form solution under commutativity (if applicable)

If $\Sigma_0$ and $\Sigma_1$ commute (i.e., $\Sigma_0 \Sigma_1 = \Sigma_1 \Sigma_0$), then $\mathbf{C}(t)$ simplifies to: $\mathbf{C}(t) = \Sigma_t^{1/2} \Sigma_0^{-1/2},$ and the solution becomes: $\mathbf{x}(t) = \boldsymbol{\mu}_t + \Sigma_t^{1/2} \Sigma_0^{-1/2} (\mathbf{x}_0 - \boldsymbol{\mu}_0).$ This holds because $\frac{d}{dt}(\Sigma_t^{1/2}) = \frac{1}{2} \dot{\Sigma}_t \Sigma_t^{-1/2}$ when $\Sigma_0$ and $\Sigma_1$ commute, satisfying the ODE for $\mathbf{C}(t)$.

Final Result

The trajectory $\mathbf{x}(t)$ is: $\boxed{\mathbf{x}(t) = \boldsymbol{\mu}_t + \mathbf{C}(t) \left( \mathbf{x}_0 - \boldsymbol{\mu}_0 \right)}$ where:

$\boldsymbol{\mu}_t = (1-t) \boldsymbol{\mu}_0 + t \boldsymbol{\mu}_1$,
$\Sigma_t = (1-t)^2 \Sigma_0 + t^2 \Sigma_1$,
$\mathbf{C}(t)$ solves the matrix ODE: $\frac{d\mathbf{C}}{dt} = \left( t \Sigma_1 - (1-t) \Sigma_0 \right) \Sigma_t^{-1} \mathbf{C}, \quad \mathbf{C}(0) = \mathbf{I}.$

If $\Sigma_0$ and $\Sigma_1$ commute, this simplifies to: $\boxed{\mathbf{x}(t) = \boldsymbol{\mu}_t + \Sigma_t^{1/2} \Sigma_0^{-1/2} \left( \mathbf{x}_0 - \boldsymbol{\mu}_0 \right)}$

Key Notes:

General Case: The ODE for $\mathbf{C}(t)$ must be solved numerically if $\Sigma_0$ and $\Sigma_1$ do not commute.
Behavior: The solution ensures that the marginal distribution of $\mathbf{x}(t)$ is $\mathcal{N}(\boldsymbol{\mu}_t, \Sigma_t)$ under independent coupling.
Initial Condition: At $t=0$, $\mathbf{x}(0) = \boldsymbol{\mu}_0 + \mathbf{I}(\mathbf{x}_0 - \boldsymbol{\mu}_0) = \mathbf{x}_0$.
Endpoint: At $t=1$, $\mathbf{x}(1) = \boldsymbol{\mu}_1 + \mathbf{C}(1) (\mathbf{x}_0 - \boldsymbol{\mu}_0)$, where $\mathbf{C}(1)$ depends on the solution to the ODE.