Takeaways
| Item | 量 | View | Diff Equation | 隨機 | Note | | ———————– | —————– | —- | ————- | —— | ———————————— | | 流體力學 Flow equation | $\Phi$ | 巨觀 | PDE | no, 平均 | 平均因爲有 diffusion | | 電磁學 continuity equation | $\rho$ | 巨觀 | PDE | no, 確定 | drift only | | Fokker-Planck equation | $p(x,t)$ | 巨觀 | PDE | no, 平均 | 平均因爲有 diffusion | | d log p/dt | $\log p$ | 巨觀 | TDE | no, 平均 | 平均因爲有 diffusion | | Flow equation (D=0) | $x_t, \mathbf{u}$ | 微觀 | PDE or TDE | no, 確定 | drift only, 但兩端 distribution是 random | | Fwd/bwd diffusion | $x_t$ | 微觀 | SDE | Yes,隨機 | drift + diffusion | PDE/TDE 常常被稱爲 ODE (wrong naming) 借以對比 SDE.
始於物理:流體力學 transport equation
\[\frac{\partial \Phi}{\partial t}=\nabla \cdot(\mathbf{u} \Phi)-\nabla \cdot(\Gamma \nabla \Phi)+S\]The transport equation (or convection-diffusion equation) describes how a scalar quantity (like temperature, concentration, or momentum component) is transported within a fluid. Its general form includes terms for convection, diffusion, and sources or sinks:
where $\Phi$ is the scalar quantity, $\mathbf{u}$ is the velocity vector, $\Gamma$ is the diffusion coefficient, and $S$ is a source term.
我們進一步分析物理意義。先假設無源或無漏 $S=0$, $\Phi$ 基本是 conservative 的純量,例如濃度。 \(\frac{\partial \Phi}{\partial t}=\nabla \cdot(\mathbf{u} \Phi)-\nabla \cdot(\Gamma \nabla \Phi) = - \nabla \cdot(-\mathbf{u} \Phi +\Gamma \nabla \Phi)\) 左手是純量時間變化量 = 右手是空間的散出(或吸入)量。分成兩項:一個是外部場造成 flow 的 drift item, 另一個是因爲濃度差 (隨機) 散出的 diffusion item. 如果 $\Phi$ 隨時間減少 (負值),散度是正值,所以需要一個負號。$\mathbf{u}$ 的正向定義是向内,可以和 local 負號抵消。
在半導體的 transport equation 在平衡態 drift 和 diffusion 互相抵消。所以兩者的符號相反。
在電磁學和 flow matching 一般 flow 定義的方向相反,所以是
\[\frac{\partial \Phi}{\partial t}=\nabla \cdot(\mathbf{u} \Phi)-\nabla \cdot(\Gamma \nabla \Phi) = - \nabla \cdot(\mathbf{u} \Phi +\Gamma \nabla \Phi)\]巨觀 vs 微觀,PDE/ODE vs SDE,$\boldsymbol{x}_t$ vs $p(x, t)$
Fokker-Planck 偏微分方程
在 generative AI 的 diffusion process 或是 flow method: 守恆量是機率 (任意時間點的機率和為 1) $\Phi = p(x, t)$,沒有 source $S=0$. 不過一般物理的 diffusion 是從高濃度向低濃度,但是 probability 卻是從低機率 diffuse 到高機率。可以等價視爲負 diffusion constant: $D = -\Gamma$ ,如果假設是 isotropic diffusion, $D(t)$ 和位置無關,可以和時間有關 (noise scheduling)。Fokker-Planck equation 一般寫成: \(\frac{\partial p(x,t)}{\partial t}=-\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]+\nabla \cdot[D(t) \nabla p(x,t)] = -\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]+D(t) \Delta p(x,t)\)
另一個表示
\(\begin{aligned} \frac{\partial\log p(x,t)}{\partial t}&= \frac{1}{p(x,t)}\frac{\partial p(x,t)}{\partial t}\\ &= -\frac{\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]}{p(x,t)}+D(t) \frac{\Delta p(x,t)}{p(x,t)}\\ &= -\nabla \cdot \mathbf{u}(x,t)-\mathbf{u}(x,t)\cdot \nabla\log p(x,t) +D(t) \frac{\Delta p(x,t)}{p(x,t)}\\ &= -\mathbf{u}(x,t)\cdot \nabla\log p(x,t) \underbrace{- \nabla \cdot \mathbf{u}(x,t) +D(t)[\Delta \log p(x,t)] + \| \nabla \log p(x,t) \|^2]}_{=\frac{d\log p(x,t)}{d t}}\\ \end{aligned}\)
幾個常用公式 check:
Conservation of probability:
\(\frac{d}{dt} \int p(x,t)\, dx = \int \frac{\partial p(x,t)}{\partial t}\, dx = 0\) \(\mathbb{E}_{p} \left[ \frac{\partial}{\partial t} \log p(x,t) \right] = \int p(x,t) \cdot \frac{\partial}{\partial t} \log p(x,t)\, dx = \int \frac{\partial p(x,t)}{\partial t}\, dx = 0\) Score function 期望值 $= 0$:$\mathbb{E}(\nabla \log p)=0$. 所有的 distribution (Appendix S), 不只是 Gaussian!
Fisher information $\ge 0$:$I(p) = \mathbb{E}(|\nabla \log p|^2)=\int p |\nabla \log p|^2 dx\, \ge 0$ .
Entropy $> 0$:$H(t) = - \mathbb{E}_p\left[\log p(x,t)\right] = -\int p(x,t)\log p(x,t)dx \,(> 0)$
Entropy 時間變化: \(\begin{aligned} \frac{d H(t)}{dt} &= -\int \frac{\partial p(x,t)}{\partial t}\log p(x,t)dx \\ &= \int \left[\nabla \cdot[\boldsymbol{u}(x,t)\, p(x,t)]-D(t) \Delta p(x,t)\right] \log p(x,t)dx \\ &= \underbrace{\mathbb{E}[\nabla \cdot \boldsymbol{u}(x,t)]}_{\text{Drift contribution}} + \underbrace{D(t) I(p)}_{\text{Diffusion contribution}}, \end{aligned}\)
- Diffusion only $\ge 0$
- Drift only 可能 > 0 (膨脹 flow), < 0 (壓縮 flow), = 0 (incompressible flow)
微觀 sample $x_t$ Ito SDE
Fokker-Planck equation 是以 ODE (主要是偏微分) 的形式,代表(巨觀)平均 flow $p(x,t)$ 的形式。另一條路線則是 Ito SDE,代表(微觀)個別樣本隨機運動 $x_t$ 的形式。個別 sample 一定是 $t$ 的函數,因此是全微分表示法。
Ito SDE general 的形式如下。這裏的 $\sigma(x_t, t)$ 是 “incremental” additive noise,我們之後會改成 $g(t)$. 因爲很容易和後面的 “total” additive Gaussian noise $\boldsymbol{x}_t = \boldsymbol{x}_0 + \sigma(t) \boldsymbol{z}_t$ 混淆! \(d \boldsymbol{x}_t = \mathbf{u}(\boldsymbol{x}_t, t) d t+\sigma(\boldsymbol{x}_t, t) d \boldsymbol{w}_t\) 對應的 Fokker-Planck equation \(\frac{\partial p(x,t)}{\partial t}= -\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]+ \Delta [D(x, t) p(x,t)]\quad \text{ where }D(x,t) = \frac{\sigma^2(x,t)}{2}\)
[!驗證平均流場]
驗證一下 $\mathbf{u}$ 是平均流和 Fokker-Planck 一致: \(\mathbf{u}(\boldsymbol{x}_t, t) = \mathbb{E}\left[\frac{d \boldsymbol{x}_t}{dt}\right]\)
我們改成 isotropic 形式, 把 $\sigma(\boldsymbol{x}_t, t)$ 改成 $g(t)$,$\boldsymbol{u}(\boldsymbol{x}_t,t)$ 變成 $\boldsymbol{f}(\boldsymbol{x}_t,t)$. \(d \boldsymbol{x}_t = \boldsymbol{f}(\boldsymbol{x}_t, t) d t+g(t) d \boldsymbol{w}_t\quad \text{ where }D(t) = \frac{g^2(t)}{2}\) 對應的偏微分 PDE: \(\frac{\partial p(x,t)}{\partial t}=-\nabla \cdot[\boldsymbol{f}(x,t)\, p(x,t)]+\frac{g^2(t)}{2} \Delta p(x,t)\) 另一個表示法: \(\begin{aligned} \frac{\partial\log p(x,t)}{\partial t}&= \frac{1}{p(x,t)}\frac{\partial p(x,t)}{\partial t}\\ &= -\mathbf{u}(x,t)\cdot \nabla\log p(x,t) \underbrace{- \nabla \cdot \mathbf{f}(x,t) +\frac{g^2(t)}{2}[\Delta \log p(x,t)] + \| \nabla \log p(x,t) \|^2]}_{=\frac{d\log p(x,t)}{d t}}\\ \end{aligned}\)
儘量不要用全微分 ODE ! \(\frac{d \log p(x,t)}{d t}=-\nabla \cdot\boldsymbol{f}(x,t)+\frac{g^2(t)}{2} [ \Delta \log p(x,t) + \|\nabla\log p(x,t)\|^2]\) 除了在 flow based 或是 particle trajectory $x(t)$ 的性質,例如:
\[\log p(x_t)=\log p(x_0)+\int_0^t \frac{d \log p(x,t)}{d t}dt\]Fokker-Planck ODE/PDE (flow) 和 Ito SDE (sample) 的關聯有很多應用:
- Stochastic thermodynamics: 微觀熱力學,非平衡動力學 (後文)
- Generative modeling: score-based models approximate $\nabla \log p$ or flow matching $\nabla\cdot f$ (另文).
- Information geometry: $|\nabla \log p|^2$ is the Fisher information density.
- Schrödinger bridge (從一個 distribution 變成另一個 distribution): this term appears in the dynamic formulation of entropy-regularized optimal transport.
Gaussian Process
上文 $p(x,t)$ 可以是任何 distribution or process. 我們考慮最特別的情況就是 $p(x,t=0)=p(x_0)$ 是 Gaussian $\sim N(\mu(0), \sigma^2(0))$.
可以證明只要 Fokker-Planck equation 的 drift term 和 diffusion coefficient 是線性, 所有 $p(x,t)=p(x_t)$ 都是 Gaussian $\sim N(\mu(t), \sigma^2(t))$。這很直覺但是很重要:因為我們只需要追蹤 $\mu(t)$ 和 $\sigma^2(t)$ 的變化就可以。
假設 $p(\boldsymbol{x}_t) = \mathcal{N}(\mu(t), \sigma^2(t))$ with initial condition $\mu(0) = x_0$, $\sigma^2(0)=0$ (或是非常小)
計算 $\mu(t)$ and $\sigma^2(t)$
我們從 Ito SDE 開始 \(d \boldsymbol{x}_t = \boldsymbol{f}(\boldsymbol{x}_t, t) d t+g(t) d \boldsymbol{w}_t\) 因爲要綫性,我們假設 $\boldsymbol{f}(\boldsymbol{x}_t, t)=-\frac{1}{2}\beta(t) \boldsymbol{x}_t$, OU process, special case 是 $\beta(t)=0$ \(d \boldsymbol{x}_t = -\frac{1}{2}\beta(t) \boldsymbol{x}_t d t+g(t) d \boldsymbol{w}_t\) 因爲 $\boldsymbol{x}_t\sim N(\mu(t), \sigma^2(t))$ ,因此 trace $\boldsymbol{x}_t = \mu(t) + \sigma(t) \boldsymbol{z}_t$
Mean $\boldsymbol{\mu}(t)=\boldsymbol{\mu}_t$ 要滿足的微分方程: 就是把 Ito SDE 兩端取期望值 $\mathbb{E}[\boldsymbol{x}_t] =\boldsymbol{\mu}_t$, $\mathbb{E}[d\boldsymbol{w}_t] =0$ \(\frac{d\boldsymbol{\mu}_t}{dt} = -\frac{1}{2}\beta(t) \boldsymbol{\mu}_t\) \(\quad \boldsymbol{\mu}(t) = \boldsymbol{x}_0 \cdot \exp\left(-\frac{1}{2} \int_0^t \beta(s) ds \right) = \sqrt{\alpha(t)} \boldsymbol{x}_0, \quad \text{ where }\alpha(t) = e^{-\int_0^t \beta(s) ds}\) $\alpha(0)=1$ and $\alpha(\infty) \to 0$
一個特例是: $\beta(t)=\beta$ (constant), $\alpha(t) = e^{-\beta t}$, $\alpha(t)$ exponential decay.
實務上 $\beta(t)$ 用 linear decay, 或是 cosine scheduling.
**Variance $\boldsymbol{\sigma}^2(t)$ 要滿足的微分方程 (Itô 引理):
\[d\left(\boldsymbol{x}_t^2\right) = 2 \boldsymbol{x}_t\, d\boldsymbol{x}_t + (d\boldsymbol{x}_t)^2 \Rightarrow \frac{d}{dt} \mathbb{E}[\boldsymbol{x}_t^2] = 2 \mathbb{E}[\boldsymbol{x}_t\, \text{drift}] + \mathbb{E}[\text{diffusion}^2]\] \[\frac{d{\sigma}^2(t)}{dt} = -\beta(t) {\sigma}^2(t) + g^2(t)\]解得: \(\sigma^2(t) = \int_0^t g^2(s) \exp\left(-\int_s^t \beta(r)\,dr\right) ds\) 因此, \(\sigma^2(t) = \sigma^2(0) \exp\left(-\int_0^t \beta(r)\,dr\right) + \int_0^t g^2(s) \exp\left(-\int_s^t \beta(r)\,dr\right) ds\) 利用 $\alpha(t) = e^{-\int_0^t \beta(s)ds}$,可表示为: \(\boxed{ \sigma^2(t) \;=\; \alpha(t)\left(\,\int_0^t\frac{g^2(s)}{\alpha(s)}\,ds + \sigma^2(0)\right)}.\) 第二項 $\sigma^2(0)\alpha(t)$ 會 decay 到可以忽略。 假設 $\sigma^2(0) = 0$ \(\boxed{ \sigma^2(t) \;=\; \alpha(t)\,\int_0^t\frac{g^2(s)}{\alpha(s)}\,ds . }\) 特例:假設 $g^2(t)$ 和 $\beta(t)$ 是常數。$\sigma^2(t) = \frac{g^2}{\beta}(1-e^{-\beta t})$,最大值是 $\frac{g^2}{\beta}$
2. 計算 $\frac{\partial p}{\partial t}$
正態分布 $p(\boldsymbol{x}, t) = \mathcal{N}(\mu(t), \sigma^2(t)I)$ 的时间导数为: \(\frac{\partial p}{\partial t} = p \left[ \frac{(\boldsymbol{x}-\mu) \dot{\mu}}{\sigma^2} + \frac{(\boldsymbol{x}-\mu)^2 - d\sigma^2}{2\sigma^4} \dot{\sigma^2} \right]\) 代入線性條件 $\dot{\mu} = -\frac{1}{2}\beta(t)\mu$ 和 $\dot{\sigma^2} = -\beta(t)\sigma^2 + g^2(t)$,得到: \(\begin{align} \frac{\partial p}{\partial t} &= p \left[ -\frac{\beta(t)(\boldsymbol{x}-\mu)\mu}{2\sigma^2} + \frac{(\boldsymbol{x}-\mu)^2 - d\sigma^2}{2\sigma^4} \left(-\beta(t)\sigma^2 + g^2(t)\right) \right]\\ &= p \left[ -\beta(t)\frac{x(x-\mu) - d \sigma^2}{2\sigma^2} +\frac{(\boldsymbol{x}-\mu)^2 - d\sigma^2}{2\sigma^4} g^2(t) \right]\\ \end{align}\)
3. 計算 $\frac{\partial \log p}{\partial t} = \frac{1}{p}\frac{\partial p}{\partial t}$
\(\frac{\partial \log p}{\partial t} = \left[ \frac{(\boldsymbol{x}-\mu) \dot{\mu}}{\sigma^2} + \frac{(\boldsymbol{x}-\mu)^2 - d\sigma^2}{2\sigma^4} \dot{\sigma^2} \right]\) 代入 $\dot{\mu}$ 和 $\dot{\sigma^2}$ 得: \(\frac{\partial \log p}{\partial t} = \left[ \underbrace{-\beta(t)\frac{x(x-\mu) - d \sigma^2}{2\sigma^2}}_{\text{drift}} +\underbrace{\frac{(\boldsymbol{x}-\mu)^2 - d\sigma^2}{2\sigma^4} g^2(t)}_{\text{diffusion}} \right]\)
對比一下: \(\frac{\partial p(x,t)}{\partial t}=-\nabla \cdot[\boldsymbol{f}(x,t)\, p(x,t)]+\frac{g^2(t)}{2} \Delta p(x,t)\) 另一個表示法: \(\begin{aligned} \frac{\partial\log p(x,t)}{\partial t}= -\mathbf{u}(x,t)\cdot \nabla\log p(x,t) - \nabla \cdot \mathbf{f}(x,t) +\frac{g^2(t)}{2}[\Delta \log p(x,t)] + \| \nabla \log p(x,t) \|^2]\\ \end{aligned}\)
- $\nabla \cdot[\boldsymbol{f}(x,t)\, p(x,t)] = \left[ \beta(t)\frac{x(x-\mu) - d \sigma^2}{2\sigma^2} \right] p$
- 更進一步分解: $\nabla \cdot \boldsymbol{f}(x,t) = -\frac{\beta(t)d}{2}$ and $\mathbf{u}(x,t)\cdot \nabla\log p(x,t) = \frac{\beta(t)x(x-\mu)}{2\sigma^2(t)}$
- Score function: $\nabla \log p = \frac{x-\mu}{\sigma^2}$
- Fisher Information density function: $|\nabla \log p|^2 = \frac{|x-\mu|^2}{\sigma^4}$
- $\Delta \log p = \frac{-d}{\sigma^2}$
Examples: VE, VP, Sub-VP
VE: Diffusion only forward SDE
mean: $\frac{d\boldsymbol{\mu}(t)}{dt} = \boldsymbol{f} = 0$, $\to \boldsymbol{\mu}(t) = \mathbf{x}_0$
or $\beta(t)=0 \to \alpha(t)=1$ 所以 $\boldsymbol{\mu}(t) = \mathbf{x}_0$
variance: $\frac{d\boldsymbol{\sigma^2}(t)}{dt} = g^2(t)\(\to {\sigma^2}(t) = \int g^2(t)dt$\)p(\mathbf{x}, t) = N(\mathbf{x}_0, \int g^2(t) \,dt\,\mathbf{I}) = N(\mathbf{x}_0, \sigma^2(t)\,\mathbf{I})\(因此 trace $\boldsymbol{x}_t$ 可以寫成\)\boxed{\mathbf{x}_t = \mathbf{x}_0 + \sigma(t) \mathbf{z}_t }$$ $\sigma(t) = \sqrt{\int g^2(t)dt}$, 可以很大 (variance explode)
Diffusion only Fokker-Planck equation 如下。因爲 drift $u = 0$, 偏微分和全微分相等!
\[\frac{\partial p(x,t)}{\partial t}= \frac{d p(x,t)}{d t} = \frac{g^2(t)}{2} \Delta p(x,t)\] \[\begin{aligned} \frac{\partial\log p(x,t)}{\partial t}&= \frac{d \log p(x,t)}{d t}= \frac{g^2(t)}{2} [\Delta \log p(x,t)] + \| \nabla \log p(x,t) \|^2]\\ \end{aligned}\]驗證上式滿足 Fokker-Planck PDE (Appendix H) \(\boxed{ \frac{\partial p(x,t)}{\partial t} = \frac{g^2(t)}{2} \Delta p(x,t) = \frac{\dot{\sigma}^2(t)}{2} \left( \frac{\|x-x_0\|^2}{\sigma^4(t)} - \frac{d}{\sigma^2(t)} \right) p = \dot{\sigma}^2(t) \left( \frac{\|x-x_0\|^2 - d \sigma^2(t)}{2\sigma^4(t)} \right) p }\) where $d$ 是 dimension.
\(\boxed{ \frac{\partial \log p(x,t)}{\partial t} = \frac{g^2(t)}{2} [\Delta \log p(x,t)] + \| \nabla \log p(x,t) \|^2] = \dot{\sigma}^2(t) \left( \frac{\|x-x_0\|^2 - d \sigma^2(t)}{2\sigma^4(t)} \right) }\)
- Score function: $\nabla \log p = \frac{x-x_0}{\sigma^2(t)}$
- Fisher Information density function: $|\nabla \log p|^2 = \frac{|x-x_0|^2}{\sigma^4(t)}$
- Density 積分後得到 Fisher Information: $\frac{d}{\sigma^2(t)}$ \(\mathcal{I}(\theta) = \mathbb{E}[\|\nabla \log p(x)\|^2] = -\mathbb{E}[\Delta \log p(x)]\)
- Laplacian of log likelihood function 是常數 (spatial): $\Delta \log p = \frac{-d}{\sigma^2(t)}$, 滿足 \(\mathbb{E}[\|\nabla \log p(x)\|^2] = -\mathbb{E}[\Delta \log p(x)] = \frac{d}{\sigma^2(t)}\)
驗證 diffusion process 的 probability conservation。(即使有 drift 也是滿足 probability conservation, 而且 diffusion 和 drift 會各自滿足) \(\mathbb{E}\left[\frac{\partial \log p}{\partial t}\right] = \mathbb{E}\left[\frac{g^2(t)}{2} \left(\|\nabla \log p\|^2 +\Delta \log p \right)\right]=0\) Diffusion 的 entropy 永遠增加。 $\sigma^2(t) = \int g^2(t)$ 是遞增函數。 \(\begin{aligned} \frac{d H(t)}{dt} &= -\int \frac{\partial p(x,t)}{\partial t}\log p(x,t)dx \\ &= \underbrace{\frac{g^2(t)}{2} I(p)}_{\text{Diffusion contribution}} = \frac{d}{2}\frac{d \log \sigma^2(t)}{d t} > 0 \end{aligned}\) 第一個 $d$ 是 dimension.
OU process
\[d\boldsymbol{x}_t = -\frac{1}{2}\beta(t) \boldsymbol{x}_t \,dt + g(t)\, d\boldsymbol{w}_t\]This defines an OU process with time-dependent coefficients.
Fokker–Planck Equation
The corresponding PDE is:
\[\frac{\partial p}{\partial t} = \nabla \cdot\left(\frac{1}{2}\beta(t) \boldsymbol{x} p \right) + \frac{g^2(t)}{2} \Delta p\]Trace $\boldsymbol{x}_t = \mu(t) + \sigma(t) \boldsymbol{z}_t$
Mean 要滿足的微分方程: \(\frac{d\boldsymbol{\mu}(t)}{dt} = -\frac{1}{2}\beta(t) \boldsymbol{\mu}(t)\) \(\quad \boldsymbol{\mu}(t) = \boldsymbol{x}_0 \cdot \exp\left(-\frac{1}{2} \int_0^t \beta(s) ds \right) = \sqrt{\alpha(t)} \boldsymbol{x}_0, \quad \text{ where }\alpha(t) = e^{-\int_0^t \beta(s) ds}\) $\alpha(0)=1$ and $\alpha(\infty) \to 0$
Variance 要滿足的微分方程: assuming $\sigma(0)=0$ \(\frac{d{\sigma}^2(t)}{dt} = -\beta(t) {\sigma}^2(t) + g^2(t)\) \(\boxed{ \sigma^2(t) \;=\; \alpha(t)\,\int_0^t\frac{g^2(s)}{\alpha(s)}\,ds. }\)
不同的 noise scheduling $g(t)$ 對應不同的方法:
VP - SDE: $g(t) = \sqrt{\beta(t)}$
\(d\boldsymbol{x}_t = -\frac{1}{2}\beta(t) \boldsymbol{x}_t \,dt + \sqrt{\beta(t)}\, d\boldsymbol{w}_t\) \(g(t) = \sqrt{\beta(t)} \quad\Rightarrow\quad g^2(t) = \beta(t)\)
Recall the formula for variance of the Ornstein–Uhlenbeck process with time-dependent coefficients:
\[\sigma^2(t) = \alpha(t) \int_0^t \frac{g^2(s)}{\alpha(s)}\, ds \quad\text{where}\quad \alpha(t) = e^{-\int_0^t \beta(s)\, ds}\]Substituting $g^2(s) = \beta(s)$ into the formula:
\[\sigma^2(t) = \alpha(t) \int_0^t \frac{\beta(s)}{\alpha(s)}\, ds\]Now notice that:
\[\alpha'(s) = -\beta(s)\alpha(s) \quad\Rightarrow\quad \frac{\beta(s)}{\alpha(s)} = \frac{d}{ds}\left( \frac{1}{\alpha(s)} \right)\]So:
\[\int_0^t \frac{\beta(s)}{\alpha(s)}\, ds = \int_0^t \frac{d}{ds}\left( \frac{1}{\alpha(s)} \right) ds = \left[ \frac{1}{\alpha(s)} \right]_0^t = \left( \frac{1}{\alpha(t)} - \frac{1}{\alpha(0)} \right) = \frac{1}{\alpha(t)}-1\]Therefore:
\(\sigma^2(t) = \alpha(t) \left( \frac{1}{\alpha(t)} -1\right) = 1-\alpha(t)\) 所以最後的 $\boxed{\boldsymbol{x}_t = \sqrt{\alpha(t)}\mathbf{x}_0 + \sqrt{1-\alpha(t)} \boldsymbol{z}_t}$
$\alpha(t) = e^{-\int_0^t \beta(s) ds}$; 即使 $\beta(t)$ 大到無窮大,還是會被 drift term 壓制,所以 $\sigma^2(t) < 1$
Sub-VP by Song $g(t) = \sqrt{\beta(t)(1-\alpha^2(t))}$, Appendix L
\(g^2(t) = \beta(t)\left(1 - \alpha^2(t)\right)\) \(d\boldsymbol{x}_t = -\frac{1}{2}\beta(t) \boldsymbol{x}_t \,dt + \sqrt{\beta(t)(1-\alpha^2(t))}\, d\boldsymbol{w}_t\) 概念是把 $\beta(t)$ 做時間上的 weighting, 時間接近 0 加入的 noise 少。一開始少加 noise, 之後和 VP 就差不多。
We want to simplify:
\[\sigma^2(t) = \alpha(t) \int_0^t \frac{\beta(s) \left(1 - \alpha^2(s)\right)}{\alpha(s)}\, ds\] \[= \alpha(t) \int_0^t \beta(s)\left(\frac{1}{\alpha(s)} - \alpha(s)\right)\, ds\] \[\sigma^2(t) = \alpha(t) \left[ \int_0^t \frac{\beta(s)}{\alpha(s)}\, ds - \int_0^t \beta(s) \alpha(s)\, ds \right]\]Use earlier identity:
\[\int_0^t \frac{\beta(s)}{\alpha(s)}\, ds = \frac{1}{\alpha(t)} - 1\]And for the second integral, observe:
\[\frac{d\alpha(s)}{ds} = -\beta(s)\alpha(s) \Rightarrow \int_0^t \beta(s)\alpha(s)\, ds = -[\alpha(s)]_0^t = 1 - \alpha(t)\] \[\sigma^2(t) = \alpha(t) \left[ \left(\frac{1}{\alpha(t)} - 1\right) - (1 - \alpha(t)) \right]\] \[= \alpha(t) \left[ \frac{1}{\alpha(t)} - 2 + \alpha(t) \right] = 1 - 2\alpha(t) + \alpha^2(t)\] \[\boxed{ \sigma^2(t) = (1 - \alpha(t))^2 }\]- $\sigma^2(0) = 0$
- $\sigma^2(t) \to 1$ as $\alpha(t) \to 0$
- Always non-negative
所以最後的 $\boxed{\boldsymbol{x}_t = \sqrt{\alpha(t)}\mathbf{x}_0 + ({1-\alpha(t)}) \boldsymbol{z}_t}$
以下的比較 VP 不正確,是 normalized 的結果。需要修正。
![[Pasted image 20250504090729.png]]
- Reverse SDE, 舉例用 VP. x_0 -> 0, noise the same!!
正向 vs 反向
結論:如果沒有 random 項,微分方程是 time reversible, 只是時間變負號 $p(x,t), \frac{\partial p(x,t)}{\partial t}, \frac{d \log p(x(t),t)}{dt}$: 正向和反向 PDE, ODE 都一樣,只有 time reverse $dt \to -dt$
但是 $x_t$ 正向和反向 Ito SDE 雖然都是 drift + diffusion (random), 但是反向要加上 score function 引起的 drift 修正項。
Ito SDE 反向 $\ne$ 正向
时间反转:定义反向过程$\boldsymbol{y}s = \boldsymbol{x}{T - s}$,其中$s$是反向时间变量。根据扩散过程的时间反转定理,反向过程的漂移项需要调整以补偿原过程的漂移和扩散项的影响。
调整漂移项:反向过程的漂移项在原漂移项的负数基础上,加上由概率密度梯度 (i.e. score function) 引起的修正项。这个修正项的形式为$g(t)^2 \nabla_x \log p(\boldsymbol{x}_t, t)$,其中$D(t) = \frac{g(t)^2}{2}$。
扩散项保持不变:反向SDE的扩散项与原SDE相同,仍为$g(t) d\boldsymbol{w}_t$。
反向SDE的形式为:(此處 $t$ 遞減) \(d\boldsymbol{x}_t = \left[ \boldsymbol{f}(\boldsymbol{x}_t, t) - g(t)^2 \nabla_x \log p(\boldsymbol{x}_t, t) \right] dt + g(t) d\boldsymbol{w}_t\)
ODE 反向 = 正向 (DDIM)
概率流连续性方程:ODE的 drift 项需要确保其概率演化与原SDE的Fokker-Planck方程一致。通过对比连续性方程和Fokker-Planck方程,得到 drift 项的调整形式。
调整 drift 项:ODE的 drift 项在原 drift 项基础上,减去$\frac{g(t)^2}{2} \nabla_x \log p(\boldsymbol{x}_t, t)$,以移除随机性的影响。
对应ODE的形式为: \(d\boldsymbol{x}_t = \left[ \boldsymbol{f}(\boldsymbol{x}_t, t) - \frac{g(t)^2}{2} \nabla_x \log p(\boldsymbol{x}_t, t) \right] dt\)
Fokker-Planck 反向 = 正向 (Appendix G)
在正确构造反向随机微分方程(SDE)的情况下,反向过程的边际概率密度函数(marginal PDF)与正向过程在对应的时间点上是相同的。具体来说,若正向过程的时间变量为 $t \in [0, T]$,则反向过程在时间 $s = T - t$ 的边际 PDF 等于正向过程在时间 $t$ 的边际 PDF,即:
\[p_{\text{reverse}}(x, s) = p_{\text{forward}}(x, T - s).\]反向 Fokker-Planck 方程为: \(\boxed{ \frac{\partial p_{\text{reverse}}(\boldsymbol{x}, t)}{\partial t} = \nabla \cdot \left[ \boldsymbol{f}(\boldsymbol{x}, t) p_{\text{reverse}}(\boldsymbol{x}, t) \right] - \frac{g^2(t)}{2} \Delta p_{\text{reverse}}(\boldsymbol{x}, t) }\)
在时间反转 $s = T - t$ 下,其形式与正向 Fokker-Planck 方程一致,从而保证边际概率密度对称: \(p_{\text{reverse}}(\boldsymbol{x}, T - t) = p_{\text{forward}}(\boldsymbol{x}, t).\) 这一对称性成立的条件:
- 反向 SDE 的漂移项包含修正项 $g^2(t) \nabla_x \log p(\boldsymbol{x}_t, t)$,确保概率流的连续性。
- 初始条件对称:反向过程在 $s = 0$(对应 $t = T$)的分布与正向过程在 $t = T$ 的分布相同。
直观解释
- 扩散模型中的例子:在去噪扩散模型中,正向过程逐渐添加噪声,反向过程逐步去噪。若反向过程的漂移项修正正确(包含分数函数 $\nabla_x \log p$),则每个时间点的边际分布与正向过程对称。
反向 Gaussian Process
Anderson 證明 forward diffusion process 的 reverse diffusion process.
我們考慮最特別的情況就是 $p(x,t=0)=p(x_0)$ 是 Gaussian $\sim N(\mu(0), \sigma^2(0))$. 對應 $t < 0$
$p(x,t)=p(x_t)$ 都是 Gaussian $\sim N(\mu(t), \sigma^2(t))$。因為反向也是 Gaussian process.
假設 $p(\boldsymbol{x}_t) = \mathcal{N}(\mu(t), \sigma^2(t))$ with initial condition $\mu(0)$, $\sigma^2(0)$ (一般是 $N(0, I)$)
計算 $\mu(t)$ and $\sigma^2(t)$
我們從 Ito SDE 開始 \(d\boldsymbol{x}_t = \left[ \boldsymbol{f}(\boldsymbol{x}_t, t) - g(t)^2 \nabla_x \log p(\boldsymbol{x}_t, t) \right] dt + g(t) d\boldsymbol{w}_t\) 因爲要綫性,我們假設 $\boldsymbol{f}(\boldsymbol{x}_t, t)=-\frac{1}{2}\beta(t) \boldsymbol{x}_t$, OU process, special case 是 $\beta(t)=0$ \(d \boldsymbol{x}_t = \left[-\frac{1}{2}\beta(t) \boldsymbol{x}_t + g^2(t) \frac{x_t-\mu(t)}{\sigma^2(t)}\right] d t+g(t) d \boldsymbol{w}_t\) 因爲 $\boldsymbol{x}_t\sim N(\mu(t), \sigma^2(t))$ ,因此 trace $\boldsymbol{x}_t = \mu(t) + \sigma(t) \boldsymbol{z}_t$
**Mean $\boldsymbol{\mu}(t)=\boldsymbol{\mu}_t$ 要滿足的微分方程:
把 Ito SDE 兩端取期望值 $\mathbb{E}[\boldsymbol{x}_t] =\boldsymbol{\mu}_t$, $\mathbb{E}[d\boldsymbol{w}_t] =0$, 重點是 $\mathbb{E}[\nabla_x \log p(\boldsymbol{x}_t, t)]=0$, 所以反向的 mean 從 $t$ 的角度和正向完全一樣!但如果從 $\tau=T-t$ 的角度則差了一個負號。 \(\frac{d\boldsymbol{\mu}_t}{dt} = -\frac{1}{2}\beta(t) \boldsymbol{\mu}_t\) \(\quad \boldsymbol{\mu}(t) = \boldsymbol{x}_0 \cdot \exp\left(-\frac{1}{2} \int_0^t \beta(s) ds \right) = \sqrt{\alpha(t)} \boldsymbol{x}_0, \quad \text{ where }\alpha(t) = e^{-\int_0^t \beta(s) ds}\) $\alpha(0)=1$ and $\alpha(\infty) \to 0$.
**(反向和正向不同,差了一個負號) Variance $\boldsymbol{\sigma}^2(t)$ 要滿足的微分方程: (Appendix U) \(d\left(\boldsymbol{x}_t^2\right) = 2 \boldsymbol{x}_t\, d\boldsymbol{x}_t + (d\boldsymbol{x}_t)^2 \Rightarrow \frac{d}{dt} \mathbb{E}[\boldsymbol{x}_t^2] = 2 \mathbb{E}[\boldsymbol{x}_t\, \text{drift}] + \mathbb{E}[\text{diffusion}^2]\)
[!下式是錯的!] \(\frac{d{\sigma}^2(t)}{dt} = [-\beta(t)+\frac{2g^2(t)}{\sigma^2(t)}] {\sigma}^2(t) + g^2(t) = -\beta(t) {\sigma}^2(t) + 3g^2(t)\)
正確的:應該用 $\tau = T - t$, \(d \boldsymbol{x}_{\tau} = \left[-\frac{1}{2}\beta(T-\tau) \boldsymbol{x}_{\tau} + g^2(T-\tau) \frac{x_{\tau}-\mu_{\tau}}{\sigma^2_{\tau}}\right] (-d \tau)+g(T-\tau) d \boldsymbol{w}_{\tau}\) 反向 mean 公式: \(\begin{aligned} \frac{d\boldsymbol{\mu}_{\tau}}{d\tau} &= \frac{1}{2}\beta(T-\tau) \boldsymbol{\mu}_{\tau}\\ &= -\frac{d\boldsymbol{\mu}}{dt}\bigg|_{t=T-\tau} \end{aligned}\) 反向 variance 公式: \(\begin{aligned} \frac{d{\sigma}_r^2(\tau)}{d\tau} &= \left[\beta(T-\tau)-\frac{2g^2(T-\tau)}{\sigma_r^2(\tau)}\right] {\sigma}_r^2(\tau) + g^2(T-\tau)\\ &= \beta(T-\tau) {\sigma}_r^2(\tau) - g^2(T-\tau)\\ &= -\frac{d{\sigma}^2}{dt}\bigg|_{t=T-\tau} \end{aligned}\)
結論:此處用 Gaussian 爲例。雖然 forward SDE 和 reverse SDE 不同,但是 Gaussian distribution 是 time reverse. In general for non-Gaussian distribution 也是 true (From Fokker-Planck time reverse) \(\frac{d\boldsymbol{\mu}_r}{d\tau} = -\frac{d\boldsymbol{\mu}}{dt}\bigg|_{t=T-\tau}, \quad \frac{d\boldsymbol{\sigma}^2_r}{d\tau} = -\frac{d\boldsymbol{\sigma}^2}{dt}\bigg|_{t=T-\tau}\)
VE: Diffusion only 反向 SDE :$\beta(\tau)=\beta(T-t)=0$
Mean: $\frac{d\boldsymbol{\mu}_r(\tau)}{dt} = 0$, $\to \boldsymbol{\mu}_r(\tau) = \boldsymbol{\mu}_r(\tau=0) =\boldsymbol{\mu}(T) =\mathbf{x}_0$
Variance: $\frac{d{\sigma}_r^2(\tau)}{dt} = -g^2(T-\tau)$ \(\begin{aligned} \sigma_r^2(\tau) &= \sigma_r^2(\tau=0)\underbrace{-\int_0^{\tau} g^2(T-s) d{s}}_{\text{var. decreasing}}\\ &= \sigma^2(T)-\int_{T-\tau}^T g^2(s) d{s}\\ &= \sigma^2(T)-(\sigma^2(T)-\sigma^2(T-\tau))\\ &= \sigma^2(T-\tau)= \sigma^2(t) \end{aligned}\) $\tau$ 從 0 到 T, 對應 $t = T-\tau$ 從 T 到 0. initial condition 是 $\sigma^2(T)$, 這是一個遞減 denoise 的過程。
OU Process 反向 SDE
前面已經證明 mean and variance 的 ODE 微分方程是 reversible. 理論上 ODE 微分方程是 1-to-1,不過我們還是不厭其煩再把 mean and variance 直接算出來。
反向的 mean: \(\begin{aligned} \boldsymbol{\mu}_{r}(\tau) &= \boldsymbol{\mu}_{r}(\tau=0) \exp\left(\frac{1}{2}\int_0^{\tau}\beta(T-s)ds\right)\\ &= \boldsymbol{\mu}(T) \underbrace{\exp\left(\frac{1}{2}\int_0^{\tau}\beta(T-s)ds\right)}_{\text{exponential growth}}\\ &= \boldsymbol{\mu}(T) \exp\left(\frac{1}{2}\int_{T-t}^{T}\beta(s)ds\right) \\ &= \boldsymbol{\mu}(T) \exp\left(\frac{1}{2}\left[\int_{0}^{T}\beta(s)ds-\int_{0}^{t}\beta(s)ds\right]\right) \\ &= \boldsymbol{\mu}(T) \sqrt{\frac{\alpha(t)}{\alpha(T)}} \\ &= \sqrt{\alpha(t)} \frac{\boldsymbol{\mu}(T)}{\sqrt{\alpha(T)}}=\boldsymbol{\mu}(t) \\ \end{aligned}\) 注意 $\alpha(t)$ 是 exponential decay function, 但是 $\frac{\alpha(t)}{\alpha(T)}$ 卻是 exponential growth function. 這裏利用了 forward path 的公式: \(\quad \boldsymbol{\mu}(t) = \boldsymbol{x}_0 \cdot \exp\left(-\frac{1}{2} \int_0^t \beta(s) ds \right) = \sqrt{\alpha(t)} \boldsymbol{x}_0, \quad \text{ where }\alpha(t) = e^{-\int_0^t \beta(s) ds}\)
反向 variance: \(\begin{aligned} \sigma_r^2(\tau) \;&=\; \alpha(T-\tau)\,\int_0^{\tau}\frac{g^2(T-s)}{\alpha(T-s)}\,ds + \alpha(T-\tau)\sigma_r^2(\tau=0)\\ \;=&\; \frac{\alpha(t)}{\alpha(T)}\,\int_{T-t}^{T}\frac{-g^2(s)\alpha(T)}{\alpha(s)}\,ds + \frac{\alpha(t)\sigma^2(T)}{\alpha(T)}\\ \;=&\; \frac{\alpha(t)}{\alpha(T)}\,\left[\int_{T-t}^{T}\frac{-g^2(s)\alpha(T)}{\alpha(s)}\,ds + \sigma^2(T)\right]\\ \;=&\; \frac{\alpha(t)}{\alpha(T)}\,\left[\int_{0}^{T}\frac{-g^2(s)\alpha(T)}{\alpha(s)}\,ds + \int_{0}^{t}\frac{g^2(s)\alpha(T)}{\alpha(s)}\,ds+ \sigma^2(T)\right]\\ \;=&\; \frac{\alpha(t)}{\alpha(T)}\,\left[-\alpha(T)\int_{0}^{T}\frac{g^2(s)}{\alpha(s)}\,ds + \alpha(T)\int_{0}^{t}\frac{g^2(s)}{\alpha(s)}\,ds+ \alpha(T)\,\int_0^T\frac{g^2(s)}{\alpha(s)}\,ds + \alpha(T)\sigma^2(0)\right]\\ \;=&\; \alpha(t)\,\left[ \int_{0}^{t}\frac{g^2(s)}{\alpha(s)}\,ds+ \sigma^2(0)\right] = \sigma^2(t)\\ \end{aligned}\) 此處用到 forward path 的公式: \(\sigma^2(T) \;=\; \alpha(T)\,\int_0^T\frac{g^2(s)}{\alpha(s)}\,ds + \alpha(T)\sigma^2(0)\) $\alpha(t) = \exp({-\int_0^t \beta(s) ds})$ $\alpha(T-\tau) = \exp({\int_0^{\tau} \beta(T-s) ds})=\frac{\alpha(t)}{\alpha(T)}$
具體的反向 VP and Sub-VP 的 mean and variance 就是 time reverse 的 forward path. 就不重複。
Reference
MIT 6.S184: Flow Matching and Diffusion Models https://www.youtube.com/watch?v=GCoP2w-Cqtg&t=28s&ab_channel=PeterHolderrieth
Yaron Meta paper: [2210.02747] Flow Matching for Generative Modeling
An Introduction to Flow Matching: https://mlg.eng.cam.ac.uk/blog/2024/01/20/flow-matching.html
Appendix A:
首先,原方程是一个对流-扩散方程:
\[\frac{\partial p(x,t)}{\partial t} = -\nabla \cdot [\mathbf{u}(x,t) p(x,t)] + D(t) \Delta p(x,t)\]用户给出的 (Wrong!) 第二个方程是:
\[\frac{d \log p(x,t)}{dt} = -\nabla \cdot \mathbf{u}(x,t) + \nabla \cdot [D(t) \nabla \log p(x,t)]\]我们需要验证第二个方程是否正确。
将 $p(x,t)$ 表示为 $\exp(\phi(x,t))$,即 $\phi(x,t) = \log p(x,t)$,则 $p = e^\phi$。代入原方程:
-
左边项: \(\frac{\partial p}{\partial t} = e^\phi \frac{\partial \phi}{\partial t}\)
-
右边对流项: \(-\nabla \cdot [\mathbf{u} p] = -\nabla \cdot [\mathbf{u} e^\phi] = -e^\phi (\nabla \cdot \mathbf{u} + \mathbf{u} \cdot \nabla \phi)\)
-
右边扩散项: \(D(t) \Delta p = D(t) \nabla \cdot (e^\phi \nabla \phi) = D(t) e^\phi (|\nabla \phi|^2 + \Delta \phi)\)
将原方程两边除以 $e^\phi$ 得到: \(\frac{\partial \phi}{\partial t} = -\nabla \cdot \mathbf{u} - \mathbf{u} \cdot \nabla \phi + D(t) (|\nabla \phi|^2 + \Delta \phi)\)
考虑物质导数(全导数): \(\frac{d \phi}{dt} = \frac{\partial \phi}{\partial t} + \mathbf{u} \cdot \nabla \phi\)
代入上式: \(\frac{d \phi}{dt} = -\nabla \cdot \mathbf{u} + D(t) (|\nabla \phi|^2 + \Delta \phi)\)
用户给出的方程是: \(\frac{d \log p(x,t)}{dt} = -\nabla \cdot \mathbf{u} + \nabla \cdot [D(t) \nabla \log p(x,t)]\)
其中右边扩散项为: \(\nabla \cdot [D(t) \nabla \phi] = D(t) \Delta \phi\)
| 比较两者的结果,正确的结果中包含 $D(t) ( | \nabla \phi | ^2 + \Delta \phi)$,而用户的结果中缺少了 $D(t) | \nabla \phi | ^2$ 项,因此用户的方程不正确。 |
最终答案
Excellent question — and you’re now honing in on a very clean and insightful formulation. Let’s unpack it carefully.
You’re proposing:
\[\frac{d}{dt} \log p(x(t), t) = -\nabla \cdot f + \frac{1}{2} g(t)^2 \nabla \cdot \nabla \log p = -\nabla \cdot f + \frac{1}{2} g(t)^2 \nabla^2 \log p\]This is almost correct, and it actually is correct under a specific assumption: when you treat the total derivative deterministically (ignoring Itô correction terms that come from stochastic calculus). Let’s see why, and when it’s valid or not.
🧮 Two Ways to Think About the Total Derivative
✅ 1. Expected (mean-field) evolution — what the density evolves like
If you’re tracking how $\log p(x(t), t)$ evolves on average, then using the Fokker–Planck equation and applying the chain rule gives:
\[\frac{d}{dt} \log p(x(t), t) = \partial_t \log p + f \cdot \nabla \log p + \frac{1}{2} g(t)^2 \left( \nabla^2 \log p + \|\nabla \log p\|^2 \right)\]This is the Itô formula applied to $\log p$. The term $|\nabla \log p|^2$ arises due to the nonlinear transformation (log of a stochastic function).
🚫 2. What you’re proposing:
\[\frac{d}{dt} \log p = -\nabla \cdot f + \frac{1}{2} g(t)^2 \nabla^2 \log p\]This is actually the Eulerian time derivative of $\log p$, derived directly from the Fokker–Planck equation (dividing both sides by $p$). But this is not the same as the Itô total derivative along a trajectory.
Let me show it explicitly:
📘 Derivation from Fokker–Planck:
Start from:
\[\partial_t p = -\nabla \cdot (f p) + \frac{1}{2} g(t)^2 \nabla^2 p\]Divide both sides by $p$:
\[\partial_t \log p = -\nabla \cdot f - f \cdot \nabla \log p + \frac{1}{2} g(t)^2 \left( \nabla^2 \log p + \|\nabla \log p\|^2 \right)\]Now take the total derivative:
\[\frac{d}{dt} \log p = \partial_t \log p + \dot{x} \cdot \nabla \log p\]Plug in $\dot{x} = f + \text{noise}$. The expected value (ignoring stochastic term) is:
\[f \cdot \nabla \log p\]So when you add $f \cdot \nabla \log p$ to the expression above, the two $f \cdot \nabla \log p$ terms cancel:
\[\boxed{ \frac{d}{dt} \log p = -\nabla \cdot f + \frac{1}{2} g(t)^2 \left( \nabla^2 \log p + \|\nabla \log p\|^2 \right) }\]So the correct form must include the $|\nabla \log p|^2$ term due to the Itô correction.
✅ Conclusion
So to answer directly:
\[\frac{d}{dt} \log p = -\nabla \cdot f + \frac{1}{2} g^2 \nabla^2 \log p?\]Why not
Because it misses the nonlinear Itô term $\frac{1}{2} g^2 |\nabla \log p|^2$, which is required when taking the total derivative of a nonlinear function of a stochastic process (like $\log p$).
This extra term comes from:
\[\frac{1}{2} \text{Tr}(g^2 H_{\log p}) + \frac{1}{2} \| g \nabla \log p \|^2\]If you’re okay with expectation-level or weak approximations, your version might suffice. But for pathwise correctness, you need the full Itô expression.
Would you like this rewritten in terms of the score function $s(x, t) = \nabla \log p(x, t)$? It often makes these formulas neater.
Appendix G
要推导反向随机微分方程(SDE)对应的 Fokker-Planck 方程,我们从修改后的漂移项 $\boldsymbol{f}’ = -\boldsymbol{f} + g^2(t) \nabla_x \log p(\boldsymbol{x}_t, t)$ 出发,并结合扩散项 $g(t)$。以下是详细推导步骤:
1. 正向 SDE 的 Fokker-Planck 方程
正向 SDE 为: \(d\boldsymbol{x}_t = \boldsymbol{f}(\boldsymbol{x}_t, t) dt + g(t) d\boldsymbol{w}_t,\) 其对应的 Fokker-Planck 方程为: \(\frac{\partial p(\boldsymbol{x}, t)}{\partial t} = -\nabla \cdot \left[ \boldsymbol{f}(\boldsymbol{x}, t) p(\boldsymbol{x}, t) \right] + \frac{g^2(t)}{2} \Delta p(\boldsymbol{x}, t),\) 其中 $D(t) = \frac{g^2(t)}{2}$。
2. 反向 SDE 的构造
反向 SDE 的漂移项被修正为: \(\boldsymbol{f}' = -\boldsymbol{f} + g^2(t) \nabla_x \log p(\boldsymbol{x}_t, t),\) 因此反向 SDE 的形式为: \(d\boldsymbol{x}_t = \left[ -\boldsymbol{f}(\boldsymbol{x}_t, t) + g^2(t) \nabla_x \log p(\boldsymbol{x}_t, t) \right] dt + g(t) d\boldsymbol{w}_t.\)
3. 反向 Fokker-Planck 方程的推导
反向过程的 Fokker-Planck 方程由以下公式给出: \(\frac{\partial p_{\text{reverse}}(\boldsymbol{x}, t)}{\partial t} = -\nabla \cdot \left[ \boldsymbol{f}'(\boldsymbol{x}, t) p_{\text{reverse}}(\boldsymbol{x}, t) \right] + \frac{g^2(t)}{2} \Delta p_{\text{reverse}}(\boldsymbol{x}, t).\)
将 $\boldsymbol{f}’ = -\boldsymbol{f} + g^2(t) \nabla_x \log p$ 代入: \(\frac{\partial p_{\text{reverse}}}{\partial t} = -\nabla \cdot \left[ \left( -\boldsymbol{f} + g^2 \nabla_x \log p \right) p_{\text{reverse}} \right] + \frac{g^2}{2} \Delta p_{\text{reverse}}.\)
展开括号内的项: \(\begin{aligned} \nabla \cdot \left[ \left( -\boldsymbol{f} + g^2 \nabla_x \log p \right) p_{\text{reverse}} \right] &= -\nabla \cdot \left[ \boldsymbol{f} p_{\text{reverse}} \right] + g^2 \nabla \cdot \left[ (\nabla_x \log p) p_{\text{reverse}} \right]. \end{aligned}\)
注意到 $\nabla_x \log p = \frac{\nabla_x p}{p}$,因此: \((\nabla_x \log p) p_{\text{reverse}} = \nabla_x p \quad \text{(当边际分布相同,即 $p_{\text{reverse}} = p$}.\)
代入后得到: \(g^2 \nabla \cdot \left[ \nabla_x p \right] = g^2 \Delta p.\)
因此,反向 Fokker-Planck 方程化简为: \(\frac{\partial p_{\text{reverse}}}{\partial t} = \nabla \cdot \left[ \boldsymbol{f} p_{\text{reverse}} \right] - g^2 \Delta p_{\text{reverse}} + \frac{g^2}{2} \Delta p_{\text{reverse}}.\)
合并扩散项: \(\frac{\partial p_{\text{reverse}}}{\partial t} = \nabla \cdot \left[ \boldsymbol{f} p_{\text{reverse}} \right] - \frac{g^2}{2} \Delta p_{\text{reverse}}.\)
4. 时间反转对称性
为了与正向过程对应,定义反向时间 $s = T - t$,则时间导数的符号改变: \(\frac{\partial p_{\text{reverse}}}{\partial s} = -\frac{\partial p_{\text{reverse}}}{\partial t}.\)
代入反向方程: \(\frac{\partial p_{\text{reverse}}}{\partial s} = -\nabla \cdot \left[ \boldsymbol{f} p_{\text{reverse}} \right] + \frac{g^2}{2} \Delta p_{\text{reverse}}.\)
这与正向 Fokker-Planck 方程形式一致,即: \(\frac{\partial p_{\text{forward}}}{\partial t} = -\nabla \cdot \left[ \boldsymbol{f} p_{\text{forward}} \right] + \frac{g^2}{2} \Delta p_{\text{forward}}.\)
因此,在反向时间 $s$ 下,反向过程的概率密度演化与正向过程对称。
5. 结论
反向 Fokker-Planck 方程为: \(\boxed{ \frac{\partial p_{\text{reverse}}(\boldsymbol{x}, t)}{\partial t} = \nabla \cdot \left[ \boldsymbol{f}(\boldsymbol{x}, t) p_{\text{reverse}}(\boldsymbol{x}, t) \right] - \frac{g^2(t)}{2} \Delta p_{\text{reverse}}(\boldsymbol{x}, t) }\)
在时间反转 $s = T - t$ 下,其形式与正向 Fokker-Planck 方程一致,从而保证边际概率密度对称: \(p_{\text{reverse}}(\boldsymbol{x}, T - t) = p_{\text{forward}}(\boldsymbol{x}, t).\)
Appendix H
我們驗證這個 Gaussian 解是否滿足 Fokker–Planck 方程式,進一步確認:
🧪 目標:驗證 Gaussian 解滿足 Fokker–Planck 方程
我們處理的是純擴散 SDE:
\[d\boldsymbol{x}_t = g(t)\, d\boldsymbol{w}_t\]對應的 Fokker–Planck 為:
\[\frac{\partial p}{\partial t} = \frac{g^2(t)}{2} \Delta p\]而候選解為(中心在 $\boldsymbol{x}_0$ 的 isotropic Gaussian):
\[p(x, t) = \frac{1}{(2\pi \sigma^2(t))^{d/2}} \exp\left(-\frac{\|x - x_0\|^2}{2\sigma^2(t)}\right), \quad \text{where } \sigma^2(t) = \int_0^t g^2(s)\, ds\]🔹 Step 1: 計算 $\partial_t p$
\[p(x,t) = A(t) \cdot \exp\left( -\frac{r^2}{2\sigma^2(t)} \right), \quad \text{where } r^2 = \|x - x_0\|^2\]▸ $A(t)$:
\[A(t) = \frac{1}{(2\pi \sigma^2(t))^{d/2}} \Rightarrow \frac{dA}{dt} = -\frac{d}{2} \frac{\dot{\sigma}^2(t)}{\sigma^2(t)} A(t)\]▸ chain rule:
\[\frac{\partial p}{\partial t} = \dot{A}(t)\cdot e^{-\frac{r^2}{2\sigma^2}} + A(t) \cdot \frac{\partial}{\partial t} \left(-\frac{r^2}{2\sigma^2(t)} \right) \cdot e^{-\frac{r^2}{2\sigma^2(t)}}\]因為 $\dot{\sigma}^2(t) = g^2(t)$,所以:
\[\frac{\partial p}{\partial t} = p(x,t) \left[ -\frac{d}{2} \frac{g^2(t)}{\sigma^2(t)} + \frac{r^2}{2} \cdot \frac{g^2(t)}{\sigma^4(t)} \right]\]🔹 Step 2: 計算 $\Delta p$
我們已知:
\[\nabla p = -\frac{x - x_0}{\sigma^2(t)} p(x,t) \quad \Rightarrow \quad \Delta p = \left( \frac{\|x - x_0\|^2}{\sigma^4(t)} - \frac{d}{\sigma^2(t)} \right) p(x,t)\]🔹 Step 3: 計算右邊 $\frac{g^2(t)}{2} \Delta p$
\[\frac{g^2(t)}{2} \Delta p = \frac{g^2(t)}{2} \left( \frac{r^2}{\sigma^4(t)} - \frac{d}{\sigma^2(t)} \right) p(x,t)\]這正好等於我們上面算的 $\partial_t p$!
✅ 結論
\[\boxed{ \frac{\partial p}{\partial t} = \frac{g^2(t)}{2} \Delta p }\]這證明 Gaussian 解
\[p(x,t) = \mathcal{N}\left(x_0, \sigma^2(t) \cdot \mathbf{I}\right) \quad \text{with } \sigma^2(t) = \int_0^t g^2(s) ds\]確實滿足 純 diffusion Fokker–Planck 方程。
如果你之後想驗證有 drift 的 case(e.g. OU process),也可以類似處理,但會稍微複雜一點。
需要我幫你整理成一頁筆記嗎?或接下來想推 general drift + reverse-time?
Appendix I
你這個推導方向非常清楚,接下來我們只要補完最後這個驗證就完整了: 你想要 驗證 在純 diffusion ($\mathbf{u}_t = 0$) 下,解確實符合 Fokker-Planck 方程式,而且平均速度場也與理論一致。
✅ 要驗證的內容
- SDE(純 diffusion):
其中 $g(t) \in \mathbb{R}$,且 $\boldsymbol{w}_t \sim \mathcal{N}(0, \mathbf{I})$。
- 你聲稱的解(位置分佈):
所以:
\[p(\boldsymbol{x}, t) = \mathcal{N}\left(\boldsymbol{x}_0,\; \sigma^2(t)\cdot \mathbf{I} \right),\quad \text{where } \sigma^2(t) = \int_0^t g^2(s)\, ds\]- 要驗證的東西有兩個:
-
這個解是否滿足 Fokker–Planck 方程式
\[\frac{\partial p}{\partial t} = D(t)\, \Delta p,\quad D(t) = \frac{g^2(t)}{2}\] -
Lagrangian 全微分形式是否正確
\[\frac{d \log p}{dt} = \frac{g^2(t)}{2} \left[ \Delta \log p + \|\nabla \log p\|^2 \right]\]
🧠 Step 1: 寫出 $\log p(x, t)$
\[p(x, t) = \frac{1}{(2\pi \sigma^2(t))^{d/2}} \exp\left( -\frac{\|x - x_0\|^2}{2\sigma^2(t)} \right)\]取 log:
\[\log p(x, t) = -\frac{d}{2} \log (2\pi \sigma^2(t)) - \frac{\|x - x_0\|^2}{2\sigma^2(t)}\]🧠 Step 2: 計算右邊兩項
我們要計算:
\[\Delta \log p + \|\nabla \log p\|^2\]▸ $\nabla \log p$
\[\nabla \log p = -\frac{x - x_0}{\sigma^2(t)} \quad \Rightarrow \quad \|\nabla \log p\|^2 = \frac{\|x - x_0\|^2}{\sigma^4(t)}\]▸ $\Delta \log p$
\[\Delta \log p = -\nabla \cdot \left( \frac{x - x_0}{\sigma^2(t)} \right) = -\frac{d}{\sigma^2(t)}\]所以總和是:
\[\Delta \log p + \|\nabla \log p\|^2 = -\frac{d}{\sigma^2(t)} + \frac{\|x - x_0\|^2}{\sigma^4(t)}\]🧠 Step 3: 計算 $\frac{d}{dt} \log p(x, t)$
我們只需對 $\log p(x, t)$ 對 $t$ 求導:
\[\frac{d}{dt} \log p = -\frac{d}{2} \cdot \frac{\dot{\sigma}^2(t)}{\sigma^2(t)} + \frac{\|x - x_0\|^2}{2} \cdot \left( \frac{\dot{\sigma}^2(t)}{\sigma^4(t)} \right)\]但 $\dot{\sigma}^2(t) = g^2(t)$,所以:
\[\frac{d}{dt} \log p = -\frac{d}{2} \cdot \frac{g^2(t)}{\sigma^2(t)} + \frac{\|x - x_0\|^2}{2} \cdot \frac{g^2(t)}{\sigma^4(t)}\]整理一下:
\[\frac{d}{dt} \log p = \frac{g^2(t)}{2} \left( -\frac{d}{\sigma^2(t)} + \frac{\|x - x_0\|^2}{\sigma^4(t)} \right)\]✅ 最後比較
\[\frac{d \log p}{dt} = \frac{g^2(t)}{2} \left[ \Delta \log p + \|\nabla \log p\|^2 \right]\]你右邊那項就是剛剛我們算的:
\[\Delta \log p + \|\nabla \log p\|^2 = -\frac{d}{\sigma^2(t)} + \frac{\|x - x_0\|^2}{\sigma^4(t)}\]⟹ 驗證完全吻合!
✅ 結論
你提出的 Lagrangian 全微分公式在純 diffusion($\mathbf{u}=0$)下確實滿足:
\[\frac{d \log p}{dt} = \frac{g^2(t)}{2} \left[ \Delta \log p + \|\nabla \log p\|^2 \right]\]這公式能精準描述 log-probability density 的演化,不只是純粹 PDE,也很有幾何與熱力學意義(如 entropy production、score dynamics 等)。
Appendix J: Ornstein-Uhlenbeck (OU) Process
Ornstein-Uhlenbeck (OU) Process
We now add drift:
\[d\boldsymbol{x}_t = -\frac{1}{2}\beta(t) \boldsymbol{x}_t \,dt + g(t)\, d\boldsymbol{w}_t\]This defines an OU process with time-dependent coefficients.
Fokker–Planck Equation
The corresponding PDE is:
\[\frac{\partial p}{\partial t} = \nabla \cdot\left(\frac{1}{2}\beta(t) \boldsymbol{x} p \right) + \frac{g^2(t)}{2} \Delta p\]Expanding the divergence term:
\[\nabla \cdot\left(\frac{1}{2}\beta(t) \boldsymbol{x} p \right) = \frac{1}{2}\beta(t) \left( d p + \boldsymbol{x} \cdot \nabla p \right)\]Let $p(x,t) = \mathcal{N}(\mu(t), \alpha^2(t)) = \frac{1}{(2\pi \alpha^2(t))^{d/2}} \exp\left(-\frac{|x - \mu(t)|^2}{2\alpha^2(t)}\right)$, where:
\[\alpha^2(t) = \int_0^t \exp\left(-\int_s^t \beta(r) dr \right)^2 g^2(s)\, ds, \quad \mu(0) = x_0, \quad \mu(t) = x_0 \cdot \exp\left(-\frac{1}{2} \int_0^t \beta(s) ds \right).\]Then:
\[\frac{\partial p}{\partial t} = p(x,t) \left[ -\frac{d}{2} \cdot \frac{\dot{\alpha}^2(t)}{\alpha^2(t)} + \frac{\|x - \mu(t)\|^2}{2} \cdot \frac{\dot{\alpha}^2(t)}{\alpha^4(t)} + \frac{(x - \mu(t))^\top \dot{\mu}(t)}{\alpha^2(t)} \right]\]Also:
\[\Delta p = \left( \frac{\|x - \mu(t)\|^2}{\alpha^4(t)} - \frac{d}{\alpha^2(t)} \right) p(x,t), \\ \nabla p = -\frac{x - \mu(t)}{\alpha^2(t)} p(x,t), \\ \boldsymbol{x} \cdot \nabla p = -\frac{\|x - \mu(t)\|^2}{\alpha^2(t)} p(x,t)\]Then the divergence term becomes:
\[\nabla \cdot\left(\frac{1}{2}\beta(t) \boldsymbol{x} p \right) = \frac{1}{2} \beta(t) \left( d - \frac{\|x - \mu(t)\|^2}{\alpha^2(t)} \right) p(x,t)\]Putting it together:
\[\frac{\partial p}{\partial t} = \frac{1}{2} \beta(t) \left( d - \frac{\|x - \mu(t)\|^2}{\alpha^2(t)} \right) p(x,t) + \frac{g^2(t)}{2} \left( \frac{\|x - \mu(t)\|^2}{\alpha^4(t)} - \frac{d}{\alpha^2(t)} \right) p(x,t)\]Using the definition of $\dot{\alpha}^2(t)$ and $\dot{\mu}(t)$, this matches $\partial_t p$.
Let $p(x,t) = \mathcal{N}(\mu(t), \alpha^2(t))$. Then:
\[\frac{d}{dt} \log p = -\frac{d}{2} \cdot \frac{\dot{\alpha}^2(t)}{\alpha^2(t)} + \frac{\|x - \mu(t)\|^2}{2} \cdot \frac{\dot{\alpha}^2(t)}{\alpha^4(t)} + \frac{(x - \mu(t))^\top \dot{\mu}(t)}{\alpha^2(t)}\]On the other hand:
\[\nabla \log p = -\frac{x - \mu(t)}{\alpha^2(t)}, \quad \Delta \log p = -\frac{d}{\alpha^2(t)}, \quad \|\nabla \log p\|^2 = \frac{\|x - \mu(t)\|^2}{\alpha^4(t)}\]Hence:
\[\frac{d}{dt} \log p = -\nabla \cdot f(x,t) + \frac{g^2(t)}{2} \left[ \Delta \log p + \|\nabla \log p\|^2 \right]\]with $f(x,t) = -\frac{1}{2}\beta(t) x$ and $\nabla \cdot f = -\frac{d}{2}\beta(t)$.
Conclusion
Both pure diffusion and Ornstein-Uhlenbeck processes with Gaussian solutions satisfy their respective Fokker–Planck equations and the log-probability total derivative (Lagrangian) formulation. This provides a strong analytical foundation for diffusion modeling, score-based generative models, and reverse-time SDE analysis.
Appendix K
Thanks for the clarification!
You’re now using:
\[g(t) = \sqrt{\beta(t)} \quad\Rightarrow\quad g^2(t) = \beta(t)\]Recall the formula for variance of the Ornstein–Uhlenbeck process with time-dependent coefficients:
\[\sigma^2(t) = \alpha(t) \int_0^t \frac{g^2(s)}{\alpha(s)}\, ds \quad\text{where}\quad \alpha(t) = e^{-\int_0^t \beta(s)\, ds}\]Substituting $g^2(s) = \beta(s)$ into the formula:
\[\sigma^2(t) = \alpha(t) \int_0^t \frac{\beta(s)}{\alpha(s)}\, ds\]Now notice that:
\[\alpha'(s) = -\beta(s)\alpha(s) \quad\Rightarrow\quad \frac{\beta(s)}{\alpha(s)} = \frac{d}{ds}\left( \frac{1}{\alpha(s)} \right)\]So:
\[\int_0^t \frac{\beta(s)}{\alpha(s)}\, ds = \int_0^t \frac{d}{ds}\left( \frac{1}{\alpha(s)} \right) ds = \left[ \frac{1}{\alpha(s)} \right]_0^t = \left( \frac{1}{\alpha(t)} - \frac{1}{\alpha(0)} \right) = \frac{1}{\alpha(t)}-1\]Therefore:
\[\sigma^2(t) = \alpha(t) \left( \frac{1}{\alpha(t)} -1\right) = 1- \alpha(t)\]✅ Final Result:
\[\boxed{ \sigma^2(t) = \alpha(t) - 1 \quad\text{where}\quad \alpha(t) = e^{-\int_0^t \beta(s)\, ds} }\]So the variance is entirely determined by the exponential decay factor $\alpha(t)$ from the deterministic part of the SDE.
Appendix L
Great! Let’s now derive the variance $\sigma^2(t)$ of the Ornstein–Uhlenbeck process when:
SDE:
\[d\boldsymbol{x}_t = -\frac{1}{2}\beta(t) \boldsymbol{x}_t \, dt + g(t)\, d\boldsymbol{w}_t\]Let:
- $\alpha(t) = e^{ -\int_0^t \beta(s)\, ds }$
- Drift-free solution mean: $\mu(t) = \sqrt{\alpha(t)} x_0$
-
General solution: $\boldsymbol{x}_t = \mu(t) + \boldsymbol{z}_t$, with variance $\sigma^2(t) = \mathbb{E}[ \boldsymbol{z}_t ^2]$
Variance formula:
\[\sigma^2(t) = \alpha(t) \int_0^t \frac{g^2(s)}{\alpha(s)}\, ds\]Given:
\[g^2(t) = \beta(t)\left(1 - \alpha^2(t)\right)\]We want to simplify:
\[\sigma^2(t) = \alpha(t) \int_0^t \frac{\beta(s) \left(1 - \alpha^2(s)\right)}{\alpha(s)}\, ds\] \[= \alpha(t) \int_0^t \beta(s)\left(\frac{1}{\alpha(s)} - \alpha(s)\right)\, ds\]Break into two terms:
\[\sigma^2(t) = \alpha(t) \left[ \int_0^t \frac{\beta(s)}{\alpha(s)}\, ds - \int_0^t \beta(s) \alpha(s)\, ds \right]\]Use earlier identity:
\[\int_0^t \frac{\beta(s)}{\alpha(s)}\, ds = \frac{1}{\alpha(t)} - 1\]And for the second integral, observe:
\[\frac{d\alpha(s)}{ds} = -\beta(s)\alpha(s) \Rightarrow \int_0^t \beta(s)\alpha(s)\, ds = -[\alpha(s)]_0^t = 1 - \alpha(t)\]Plug both in:
\[\sigma^2(t) = \alpha(t) \left[ \left(\frac{1}{\alpha(t)} - 1\right) - (1 - \alpha(t)) \right]\] \[= \alpha(t) \left[ \frac{1}{\alpha(t)} - 2 + \alpha(t) \right] = 1 - 2\alpha(t) + \alpha^2(t)\]✅ Final Answer:
\[\boxed{ \sigma^2(t) = (1 - \alpha(t))^2 }\]This is a very elegant result, and:
- $\sigma^2(0) = 0$
- $\sigma^2(t) \to 1$ as $\alpha(t) \to 0$
- Always non-negative
Well structured, and it confirms your earlier framework — just with a more refined $g(t)$.
Appendix M
We will continue the derivation from:
\[\frac{d H(t)}{dt} = \int \left[\nabla \cdot(\boldsymbol{f}(x,t)\, p(x,t)) - \frac{g^2(t)}{2} \Delta p(x,t)\right] \log p(x,t)\, dx\]We now handle the two terms in the integrand separately.
1. The drift term:
\[\int \nabla \cdot(\boldsymbol{f}(x,t)\, p(x,t)) \log p(x,t) \, dx\]Use integration by parts (divergence theorem) in reverse, assuming boundary terms vanish (e.g., decay at infinity):
\[\int \nabla \cdot(\boldsymbol{f}\, p) \log p \, dx = -\int \boldsymbol{f}(x,t)\, p(x,t) \cdot \nabla \log p(x,t) \, dx\]Using the identity $\nabla \log p = \frac{\nabla p}{p}$, we simplify:
\[= -\int \boldsymbol{f}(x,t) \cdot \nabla p(x,t) \, dx\]Now integrate by parts again, assuming boundary terms vanish:
\[= \int \nabla \cdot \boldsymbol{f}(x,t)\, p(x,t) \, dx\]2. The diffusion term:
\[-\frac{g^2(t)}{2} \int \Delta p(x,t) \log p(x,t)\, dx\]We use integration by parts, with the identity:
\[\int \Delta p \log p \, dx = -\int \frac{\|\nabla p\|^2}{p} \, dx = -\int p(x,t) \left\|\nabla \log p(x,t)\right\|^2 dx\]Hence, the diffusion contribution becomes:
\[\frac{g^2(t)}{2} \int p(x,t) \left\|\nabla \log p(x,t)\right\|^2 dx\]This is the Fisher information $I(p) = \int p |\nabla \log p|^2 dx$.
Final expression:
Combining both terms:
\[\frac{d H(t)}{dt} = \int \nabla \cdot \boldsymbol{f}(x,t)\, p(x,t)\, dx + \frac{g^2(t)}{2} \int p(x,t) \left\|\nabla \log p(x,t)\right\|^2 dx\]Or more compactly:
\[\boxed{ \frac{d H(t)}{dt} = \mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)] + \frac{g^2(t)}{2} I(p) }\]where:
- $\mathbb{E}[\nabla \cdot \boldsymbol{f}]$ is the expected divergence of the drift,
- $I(p)$ is the Fisher information of the distribution $p(x,t)$.
Appendix N
Great question. Let’s analyze the sign of:
\[\frac{dH(t)}{dt} = \mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)] + \frac{g^2(t)}{2} I(p)\]This tells us how the differential entropy $H(t) = -\int p(x,t) \log p(x,t) dx$ evolves over time, for a probability distribution $p(x,t)$ governed by a Fokker–Planck equation with drift $\boldsymbol{f}(x,t)$ and scalar diffusion coefficient $g(t)$.
Term-by-term analysis
1. Diffusion term: $\frac{g^2(t)}{2} I(p)$
- Always non-negative, since $g^2(t) \ge 0$ and Fisher information $I(p) \ge 0$.
- Represents the entropy-increasing effect of diffusion (spreading out the distribution).
- Strictly positive unless $p$ is uniform or a Dirac delta (infinite entropy case).
2. Drift term: $\mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)]$
- Can be positive, negative, or zero, depending on the vector field $\boldsymbol{f}(x,t)$.
- If $\nabla \cdot \boldsymbol{f} < 0$ (e.g., a contracting flow), this term decreases entropy.
- If $\nabla \cdot \boldsymbol{f} > 0$ (e.g., expanding flow), this term increases entropy.
Overall sign of $\frac{dH}{dt}$?
We cannot assert the sign of $\frac{dH}{dt}$ in general, because it depends on the balance between drift and diffusion:
- If diffusion dominates (large $g(t)$, or small $\nabla \cdot \boldsymbol{f}$), entropy increases: $\frac{dH}{dt} > 0$
- If drift dominates, and especially if it’s compressive: $\frac{dH}{dt} < 0$
- If they balance: $\frac{dH}{dt} = 0$, which can happen in stationary cases
Special case: Pure diffusion (no drift)
If $\boldsymbol{f}(x,t) = 0$, then:
\[\frac{dH(t)}{dt} = \frac{g^2(t)}{2} I(p) \ge 0\]So entropy always increases — this is consistent with the heat equation, where a peaked distribution spreads out over time.
Special case: Deterministic flow (no diffusion)
If $g(t) = 0$, then:
\[\frac{dH(t)}{dt} = \mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)]\]This can be positive or negative depending on whether the deterministic flow expands or contracts space.
Conclusion
We cannot generally say whether $\frac{dH}{dt}$ is positive or negative without knowing more about the drift and diffusion. However:
- Diffusion always increases entropy
- Drift can increase or decrease entropy, depending on whether it compresses or expands probability mass.
This interplay is fundamental in stochastic processes and nonequilibrium thermodynamics.
Appendix O
Great question! The correct expression for the time derivative of Shannon entropy is:
\[\boxed{ \frac{dH(t)}{dt} = -\int \frac{\partial p(x,t)}{\partial t} \log p(x,t)\, dx }\]Why not the material derivative $\frac{d p}{dt}$?
The material derivative $\frac{d p}{dt} = \frac{\partial p}{\partial t} + \mathbf{u} \cdot \nabla p$ describes how $p$ changes along a moving particle’s path in a flow field $\mathbf{u}(x,t)$. It’s used in fluid mechanics or stochastic processes when tracking individual trajectories.
However, entropy is a global functional of the probability density:
\[H(t) = -\int p(x,t) \log p(x,t) \, dx\]So when you take its time derivative, you apply the chain rule to the integrand:
\[\frac{dH}{dt} = -\int \left( \frac{\partial p}{\partial t} \log p + \frac{\partial p}{\partial t} \right) dx\]But since $\int \frac{\partial p}{\partial t} dx = \frac{d}{dt} \int p(x,t) dx = 0$, that second term vanishes.
So you’re left with:
\[\frac{dH}{dt} = -\int \frac{\partial p(x,t)}{\partial t} \log p(x,t) \, dx\]✅ Therefore:
Use the partial derivative $\partial p / \partial t$, not the material derivative $d p / dt$.
檢查如果用全微分錯誤的答案
\[\begin{aligned} \frac{d H(t)}{dt} &= -\int \left[\frac{d p(x,t)}{d t}\log p(x,t) + p(x,t) \frac{1}{p(x,t)}\frac{d p(x,t)}{d t}\right]dx\\ &= -\int \frac{d p(x,t)}{d t}\log p(x,t)dx - \int \frac{d p(x,t)}{d t}dx\\ &= -\int \frac{d p(x,t)}{d t}\log p(x,t)dx - \frac{d}{d t}\underbrace{\int p(x,t) dx}_{=1}\\ &= -\int \frac{d p(x,t)}{d t}\log p(x,t)dx \\ &= \int \left[\nabla \cdot[\boldsymbol{f}(x,t)]\, p(x,t)-\frac{g^2(t)}{2} \Delta p(x,t)\right] \log p(x,t)dx \\ &= \underbrace{\int\nabla \cdot[\boldsymbol{f}(x,t)]\, p(x,t)\log p(x,t)}_{\text{no simplification}} + \underbrace{\frac{g^2(t)}{2} I(p)}_{\text{Diffusion contribution}}\\ &\ne \frac{\partial H}{\partial t} \end{aligned}\]Appendix Q
You’re asking whether the following is true:
\[\boxed{ \mathbb{E}_{x \sim p(x,t)}\left[ \frac{\partial}{\partial t} \log p(x,t) \right] = 0\ ? }\]Yes — this is correct, under mild regularity conditions. Here’s why.
✅ Step-by-step Derivation
Let’s define the expectation:
\[\mathbb{E}_{p} \left[ \frac{\partial}{\partial t} \log p(x,t) \right] = \int p(x,t) \cdot \frac{\partial}{\partial t} \log p(x,t)\, dx\]Now use the identity:
\[\frac{\partial}{\partial t} \log p(x,t) = \frac{1}{p(x,t)} \cdot \frac{\partial p(x,t)}{\partial t}\]So the integrand becomes:
\[p(x,t) \cdot \frac{1}{p(x,t)} \cdot \frac{\partial p(x,t)}{\partial t} = \frac{\partial p(x,t)}{\partial t}\]Therefore:
\[\mathbb{E} \left[ \frac{\partial}{\partial t} \log p(x,t) \right] = \int \frac{\partial p(x,t)}{\partial t}\, dx\]And this is just the time derivative of total probability:
\[\int \frac{\partial p(x,t)}{\partial t} \, dx = \frac{d}{dt} \left( \int p(x,t)\, dx \right) = \frac{d}{dt} (1) = 0\]✅ Final Answer
\[\boxed{ \mathbb{E}_{x \sim p(x,t)}\left[ \frac{\partial}{\partial t} \log p(x,t) \right] = 0 }\]This holds as long as $p(x,t)$ is a valid probability density function — i.e., normalized and sufficiently smooth for differentiation under the integral sign.
Let me know if you also want to explore:
- $\mathbb{E}\left[\frac{d}{dt} \log p(x(t),t)\right]$, i.e. total derivative along a trajectory,
- or the Fisher information $\mathbb{E}[|\nabla \log p|^2]$,
- or applications of this result (e.g., in variational inference or score-based generative modeling).
Appendix R
Great — let’s now look at the total derivative of $\log p(x(t), t)$ along a trajectory $x(t)$, and whether its expectation is zero:
❓ Question:
Is the expectation of the total time derivative
\[\boxed{ \mathbb{E}_{x(t) \sim p(x,t)}\left[ \frac{d}{dt} \log p(x(t), t) \right] = 0\ ? }\]🧠 Step 1: Total derivative (chain rule)
We apply the total derivative along a trajectory $x(t)$, using the chain rule:
\[\frac{d}{dt} \log p(x(t), t) = \frac{\partial}{\partial t} \log p(x,t) + \nabla \log p(x,t) \cdot \frac{dx}{dt}\]So the expectation becomes:
\[\mathbb{E} \left[ \frac{d}{dt} \log p(x(t),t) \right] = \mathbb{E} \left[ \frac{\partial}{\partial t} \log p(x,t) \right] + \mathbb{E} \left[ \nabla \log p(x,t) \cdot \frac{dx}{dt} \right]\]From earlier, we already showed:
\[\mathbb{E} \left[ \frac{\partial}{\partial t} \log p(x,t) \right] = 0\]So now:
\[\mathbb{E} \left[ \frac{d}{dt} \log p(x(t),t) \right] = \mathbb{E} \left[ \nabla \log p(x,t) \cdot \frac{dx}{dt} \right]\]🧮 Step 2: Model dynamics of $x(t)$
Let’s suppose $x(t)$ follows a stochastic differential equation (SDE):
\[dx = \boldsymbol{f}(x,t)\, dt + \sqrt{2D(t)}\, dW_t\]Then the expected velocity (drift) is:
\[\mathbb{E}[ \nabla \log p(x,t) \cdot \dot{x} ] = \mathbb{E}[ \nabla \log p(x,t) \cdot \boldsymbol{f}(x,t) ]\]The noise term (from $dW_t$) has zero mean and does not contribute in expectation due to Itô calculus.
✅ Final result:
So the expectation is:
\[\mathbb{E} \left[ \frac{d}{dt} \log p(x(t),t) \right] = \mathbb{E}_{x \sim p(x,t)} \left[ \nabla \log p(x,t) \cdot \boldsymbol{f}(x,t) \right]\]This is not necessarily zero.
🧠 Interpretation
This expression measures how much the flow $\boldsymbol{f}(x,t)$ aligns with the score $\nabla \log p(x,t)$. It shows up in:
- Score-based generative models (like diffusion models)
- Fisher divergence
- Stein’s identity in variational inference
📌 Final Answer:
\[\boxed{ \mathbb{E}_{x(t) \sim p(x,t)}\left[ \frac{d}{dt} \log p(x(t), t) \right] = \mathbb{E}_{x \sim p(x,t)} \left[ \nabla \log p(x,t) \cdot \boldsymbol{f}(x,t) \right] \neq 0 \text{ in general} }\]Let me know if you want to connect this to entropy change or score matching!
Appendix S
The identity
\[\mathbb{E}[\nabla \log p(\boldsymbol{x})] = 0\]is true for any probability distribution $p(\boldsymbol{x})$, as long as $p(\boldsymbol{x})$ is differentiable and the integral involved exists (i.e., the expectations are well-defined).
🔍 Why this is true (general case)
Let’s derive it:
\[\mathbb{E}_{p}[\nabla \log p(\boldsymbol{x})] = \int \nabla \log p(\boldsymbol{x}) \cdot p(\boldsymbol{x})\, d\boldsymbol{x}\]Use the identity:
\[\nabla \log p(\boldsymbol{x}) \cdot p(\boldsymbol{x}) = \nabla p(\boldsymbol{x})\]So:
\[\mathbb{E}_{p}[\nabla \log p(\boldsymbol{x})] = \int \nabla p(\boldsymbol{x})\, d\boldsymbol{x} = \nabla \int p(\boldsymbol{x})\, d\boldsymbol{x} = \nabla 1 = 0\]✅ Summary
- Yes, $\mathbb{E}[\nabla \log p(\boldsymbol{x})] = 0$ holds for any differentiable probability density function $p(\boldsymbol{x})$ with proper support and normalization.
- It’s not specific to the Gaussian; the identity is general.
-
This identity is often used in:
- Score matching
- Variational inference (reparameterization trick)
- Fisher Information derivation
Let me know if you’d like to explore how this appears in Gaussian distributions or in practical estimators.
Appendix T
You’re given the first-order linear ODE:
\[\frac{d\sigma^2(t)}{dt} = -\beta(t)\sigma^2(t) + g^2(t)\]This is a standard linear differential equation, and we solve it using the integrating factor method.
✅ Step 1: Standard form
Write the equation as:
\[\frac{d\sigma^2(t)}{dt} + \beta(t)\sigma^2(t) = g^2(t)\]✅ Step 2: Integrating factor
The integrating factor is:
\[I(t) = \exp\left( \int_0^t \beta(s)\, ds \right)\]Multiply both sides of the equation by $I(t)$:
\[I(t) \frac{d\sigma^2(t)}{dt} + I(t)\beta(t)\sigma^2(t) = I(t)g^2(t) \Rightarrow \frac{d}{dt} \left[ I(t)\sigma^2(t) \right] = I(t)g^2(t)\]✅ Step 3: Integrate both sides
\[I(t)\sigma^2(t) = \sigma^2(0) + \int_0^t I(s) g^2(s) \, ds\]Now solve for $\sigma^2(t)$:
\[\sigma^2(t) = \sigma^2(0) \cdot e^{-\int_0^t \beta(s) ds} + \int_0^t g^2(s) \cdot e^{-\int_s^t \beta(r)\, dr} ds\]✅ Final Answer:
\[\boxed{ \sigma^2(t) = \sigma^2(0) \cdot e^{-\int_0^t \beta(s)\, ds} + \int_0^t g^2(s) \cdot e^{-\int_s^t \beta(r)\, dr} \, ds }\]You’re now asking about solving the linear ODE:
\[\frac{d\sigma^2(t)}{dt} = -\beta(t)\sigma^2(t) - g^2(t)\]This is a first-order linear nonhomogeneous ODE, and we solve it using the integrating factor method.
✅ Step 1: Standard form
Bring it to this form:
\[\frac{d\sigma^2(t)}{dt} + \beta(t)\sigma^2(t) = -g^2(t)\]✅ Step 2: Integrating factor
The integrating factor is:
\[I(t) = \exp\left( \int_0^t \beta(s)\, ds \right)\]Multiply both sides of the equation by $I(t)$:
\[I(t) \frac{d\sigma^2(t)}{dt} + I(t) \beta(t)\sigma^2(t) = -I(t)g^2(t) \Rightarrow \frac{d}{dt} \left[ I(t)\sigma^2(t) \right] = -I(t)g^2(t)\]✅ Step 3: Integrate both sides
\[I(t)\sigma^2(t) = \sigma^2(0) - \int_0^t I(s) g^2(s) \, ds\]Now solve for $\sigma^2(t)$:
\[\sigma^2(t) = \sigma^2(0) \cdot e^{-\int_0^t \beta(s) ds} - \int_0^t g^2(s) \cdot e^{-\int_s^t \beta(r) dr} \, ds\]✅ Final Answer:
\[\boxed{ \sigma^2(t) = \sigma^2(0) \cdot e^{-\int_0^t \beta(s)\, ds} - \int_0^t g^2(s) \cdot e^{-\int_s^t \beta(r)\, dr} \, ds }\]Appendix U
Great — you’re asking to solve the SDE:
\[d \boldsymbol{x}_t = -\left[\frac{1}{2} \beta(t) \boldsymbol{x}_t + g^2(t) \cdot \frac{\boldsymbol{x}_t - \mu(t)}{\sigma^2(t)}\right] dt + g(t) d \boldsymbol{w}_t\]This is a linear time-varying stochastic differential equation with mean-reverting structure.
We aim to solve for the distribution of $\boldsymbol{x}_t$, i.e., compute:
- $\mathbb{E}[\boldsymbol{x}_t] = \mu(t)$
- $\text{Var}[\boldsymbol{x}_t] = \sigma^2(t)$
You already assumed that:
\[\boldsymbol{x}_t \sim \mathcal{N}(\mu(t), \sigma^2(t))\]Now we’ll verify this and solve for $\mu(t)$ and $\sigma^2(t)$ that are consistent with the dynamics.
🧭 Strategy
- Compute $d \mathbb{E}[\boldsymbol{x}_t]$ ⇒ solve for $\mu(t)$
- Compute $\frac{d}{dt} \mathbb{E}[\boldsymbol{x}_t^2]$ ⇒ solve for $\sigma^2(t)$
1. Drift of the mean
Take expectation of the SDE:
\[\frac{d\mu(t)}{dt} = \mathbb{E}\left[ -\left( \frac{1}{2} \beta(t) \boldsymbol{x}_t + g^2(t) \frac{\boldsymbol{x}_t - \mu(t)}{\sigma^2(t)} \right) \right] = -\frac{1}{2} \beta(t) \mu(t) - g^2(t) \cdot \underbrace{\mathbb{E}\left[\frac{\boldsymbol{x}_t - \mu(t)}{\sigma^2(t)}\right]}_{=0}\]So:
\[\boxed{ \frac{d\mu(t)}{dt} = -\frac{1}{2} \beta(t) \mu(t) } \Rightarrow \boxed{ \mu(t) = \mu(0) \cdot e^{- \frac{1}{2} \int_0^t \beta(s) \, ds} }\]2. Dynamics of the variance
Use Itô’s lemma:
\[d(\boldsymbol{x}_t^2) = 2\boldsymbol{x}_t\, d\boldsymbol{x}_t + (d\boldsymbol{x}_t)^2 \Rightarrow \frac{d}{dt} \mathbb{E}[\boldsymbol{x}_t^2] = 2\mathbb{E}[\boldsymbol{x}_t \cdot \text{drift}] + \mathbb{E}[\text{diffusion}^2]\]We previously computed:
\[\frac{d}{dt} \mathbb{E}[\boldsymbol{x}_t^2] = -\beta(t) \mathbb{E}[\boldsymbol{x}_t^2] - g^2(t)\]Letting $m_2(t) := \mathbb{E}[\boldsymbol{x}_t^2]$, and recalling:
\[\sigma^2(t) = m_2(t) - \mu^2(t)\]We differentiate:
\[\frac{d}{dt} \sigma^2(t) = \frac{d}{dt} m_2(t) - \frac{d}{dt} \mu^2(t)\]We already have:
- $\frac{d}{dt} m_2(t) = -\beta(t) m_2(t) - g^2(t)$
- $\frac{d}{dt} \mu^2(t) = 2\mu(t) \cdot \frac{d\mu(t)}{dt} = -\beta(t) \mu^2(t)$
So:
\[\frac{d}{dt} \sigma^2(t) = -\beta(t) m_2(t) - g^2(t) + \beta(t) \mu^2(t) = -\beta(t)(m_2(t) - \mu^2(t)) - g^2(t)\]Thus:
\[\boxed{ \frac{d}{dt} \sigma^2(t) = -\beta(t) \sigma^2(t) - g^2(t) }\]This is the ODE you wrote in your last message, and we’ve already solved it:
✅ Final solution:
\[\boxed{ \mu(t) = \mu(0) \cdot e^{- \frac{1}{2} \int_0^t \beta(s) ds} }\] \[\boxed{ \sigma^2(t) = \sigma^2(0) \cdot e^{-\int_0^t \beta(s) ds} - \int_0^t g^2(s) \cdot e^{-\int_s^t \beta(r)\, dr} \, ds }\]So $\boldsymbol{x}_t \sim \mathcal{N}(\mu(t), \sigma^2(t))$ is indeed the solution of the original SDE.
Let me know if you want the explicit solution path-wise (i.e., $\boldsymbol{x}_t = \dots$) or simulation code.
Appendix U
To compute the score function in the reverse Stochastic Differential Equation (SDE), group the linear term, and solve for the mean and variance, we consider a general forward SDE and derive the reverse SDE without assuming the forward and reverse processes are identical. The key steps involve leveraging the Fokker-Planck equation and properties of the score function.
1. Forward and Reverse SDEs
Consider the forward SDE: \(d\mathbf{x} = \mathbf{f}(\mathbf{x}, t) dt + g(t) d\mathbf{w}\) where $\mathbf{f}(\mathbf{x}, t)$ is the drift, $g(t)$ is the diffusion coefficient, and $\mathbf{w}$ is a standard Wiener process.
The reverse-time SDE is given by: \(d\mathbf{x} = \left[ \mathbf{f}(\mathbf{x}, t) - g(t)^2 \nabla_{\mathbf{x}} \log p_t(\mathbf{x}) \right] dt + g(t) d\bar{\mathbf{w}}\) where $\bar{\mathbf{w}}$ is a reverse-time Wiener process, and $\nabla_{\mathbf{x}} \log p_t(\mathbf{x})$ is the score function of the marginal density $p_t(\mathbf{x})$.
2. Score Function and Linear Drift Assumption
The score $\nabla_{\mathbf{x}} \log p_t(\mathbf{x})$ is intractable in general. However, if the forward drift is linear: \(\mathbf{f}(\mathbf{x}, t) = \mathbf{A}(t) \mathbf{x} + \mathbf{b}(t)\) and the marginal distribution $p_t(\mathbf{x})$ is Gaussian (common in diffusion models), then: \(p_t(\mathbf{x}) = \mathcal{N}(\mathbf{x}; \boldsymbol{\mu}(t), \boldsymbol{\Sigma}(t))\) The score function simplifies to: \(\nabla_{\mathbf{x}} \log p_t(\mathbf{x}) = -\boldsymbol{\Sigma}^{-1}(t) (\mathbf{x} - \boldsymbol{\mu}(t))\)
3. Reverse SDE with Grouped Linear Term
Substitute the score into the reverse SDE: \(d\mathbf{x} = \left[ \mathbf{A}(t) \mathbf{x} + \mathbf{b}(t) - g(t)^2 \left( -\boldsymbol{\Sigma}^{-1}(t) (\mathbf{x} - \boldsymbol{\mu}(t)) \right) \right] dt + g(t) d\bar{\mathbf{w}}\) Simplify the drift: \(d\mathbf{x} = \left[ \mathbf{A}(t) \mathbf{x} + \mathbf{b}(t) + g(t)^2 \boldsymbol{\Sigma}^{-1}(t) (\mathbf{x} - \boldsymbol{\mu}(t)) \right] dt + g(t) d\bar{\mathbf{w}}\) Group linear and constant terms in $\mathbf{x}$: \(d\mathbf{x} = \left[ \underbrace{\left( \mathbf{A}(t) + g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \right)}_{\text{Linear coefficient}} \mathbf{x} + \underbrace{\left( \mathbf{b}(t) - g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \boldsymbol{\mu}(t) \right)}_{\text{Constant term}} \right] dt + g(t) d\bar{\mathbf{w}}\) Thus, the reverse drift is linear in $\mathbf{x}$.
4. Solving for Mean and Variance
Assume the reverse process is Gaussian (inherited from the forward process). Let $\boldsymbol{\mu}r(t)$ and $\boldsymbol{\Sigma}_r(t)$ be the mean and covariance of the reverse process. Since the reverse SDE is linear, the moments evolve as: \(\frac{d\boldsymbol{\mu}_r}{dt} = \left( \mathbf{A}(t) + g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \right) \boldsymbol{\mu}_r + \mathbf{b}(t) - g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \boldsymbol{\mu}(t)\) \(\frac{d\boldsymbol{\Sigma}_r}{dt} = \left( \mathbf{A}(t) + g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \right) \boldsymbol{\Sigma}_r + \boldsymbol{\Sigma}_r \left( \mathbf{A}(t) + g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \right)^\top + g(t)^2 \mathbf{I}\) Key Insight: The marginal distributions of the forward and reverse processes must match at each $t$ (i.e., $p_t(\mathbf{x}) = p{r,t}(\mathbf{x})$). Thus: \(\boldsymbol{\mu}_r(t) = \boldsymbol{\mu}(t), \quad \boldsymbol{\Sigma}_r(t) = \boldsymbol{\Sigma}(t)\) Substitute $\boldsymbol{\mu}_r = \boldsymbol{\mu}$ and $\boldsymbol{\Sigma}_r = \boldsymbol{\Sigma}$ into the moment equations.
Mean Evolution
\(\frac{d\boldsymbol{\mu}}{dt} = \left( \mathbf{A}(t) + g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \right) \boldsymbol{\mu} + \mathbf{b}(t) - g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \boldsymbol{\mu}\) Simplify: \(\frac{d\boldsymbol{\mu}}{dt} = \mathbf{A}(t) \boldsymbol{\mu} + \mathbf{b}(t)\) This matches the forward mean ODE (from Itô’s lemma on the forward SDE).
Variance Evolution
\(\frac{d\boldsymbol{\Sigma}}{dt} = \left( \mathbf{A}(t) + g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \right) \boldsymbol{\Sigma} + \boldsymbol{\Sigma} \left( \mathbf{A}(t) + g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \right)^\top + g(t)^2 \mathbf{I}\) Expand: \(\frac{d\boldsymbol{\Sigma}}{dt} = \mathbf{A}(t) \boldsymbol{\Sigma} + g(t)^2 \mathbf{I} + \boldsymbol{\Sigma} \mathbf{A}^\top(t) + g(t)^2 \mathbf{I} + g(t)^2 \boldsymbol{\Sigma} \boldsymbol{\Sigma}^{-1}(t)\) Simplify (using $\boldsymbol{\Sigma} \boldsymbol{\Sigma}^{-1} = \mathbf{I}$): \(\frac{d\boldsymbol{\Sigma}}{dt} = \mathbf{A}(t) \boldsymbol{\Sigma} + \boldsymbol{\Sigma} \mathbf{A}^\top(t) + g(t)^2 \mathbf{I} + g(t)^2 \mathbf{I} + g(t)^2 \mathbf{I} = \mathbf{A}(t) \boldsymbol{\Sigma} + \boldsymbol{\Sigma} \mathbf{A}^\top(t) + 3g(t)^2 \mathbf{I}\) But the forward variance ODE (from Itô’s lemma) is: \(\frac{d\boldsymbol{\Sigma}}{dt} = \mathbf{A}(t) \boldsymbol{\Sigma} + \boldsymbol{\Sigma} \mathbf{A}^\top(t) + g(t)^2 \mathbf{I}\) Contradiction? The reverse equation has $3g(t)^2 \mathbf{I}$ while the forward has $g(t)^2 \mathbf{I}$. This arises because the reverse SDE is defined for $dt < 0$ (backward in time). To resolve, reparameterize time.
5. Time Reparameterization
Define reverse time $\tau = T - t$ (so $d\tau = -dt$). The reverse SDE in $\tau$ (forward in $\tau$) is: \(d\mathbf{x} = -\left[ \mathbf{f}(\mathbf{x}, T-\tau) - g(T-\tau)^2 \nabla_{\mathbf{x}} \log p_{T-\tau}(\mathbf{x}) \right] d\tau + g(T-\tau) d\mathbf{w}_\tau\) Using the Gaussian score and linear drift: \(d\mathbf{x} = -\left[ \mathbf{A}(T-\tau) \mathbf{x} + \mathbf{b}(T-\tau) + g(T-\tau)^2 \boldsymbol{\Sigma}^{-1}(T-\tau) (\mathbf{x} - \boldsymbol{\mu}(T-\tau)) \right] d\tau + g(T-\tau) d\mathbf{w}_\tau\) The moment ODEs for $\tau$ (using standard Itô for $d\tau > 0$) are: \(\frac{d\boldsymbol{\mu}_r}{d\tau} = -\mathbf{A}(T-\tau) \boldsymbol{\mu}_r - \mathbf{b}(T-\tau) - g(T-\tau)^2 \boldsymbol{\Sigma}^{-1}(T-\tau) (\boldsymbol{\mu}_r - \boldsymbol{\mu}(T-\tau))\) \(\frac{d\boldsymbol{\Sigma}_r}{d\tau} = -2\mathbf{A}(T-\tau) \boldsymbol{\Sigma}_r - 2g(T-\tau)^2 \mathbf{I} + g(T-\tau)^2 \mathbf{I}\) Since $\boldsymbol{\mu}_r(\tau) = \boldsymbol{\mu}(T-\tau)$ and $\boldsymbol{\Sigma}_r(\tau) = \boldsymbol{\Sigma}(T-\tau)$, differentiate with respect to $\tau$: \(\frac{d\boldsymbol{\mu}_r}{d\tau} = -\frac{d\boldsymbol{\mu}}{dt}\bigg|_{t=T-\tau}, \quad \frac{d\boldsymbol{\Sigma}_r}{d\tau} = -\frac{d\boldsymbol{\Sigma}}{dt}\bigg|_{t=T-\tau}\) Substitute into the ODEs and equate to the forward equations. For mean: \(-\frac{d\boldsymbol{\mu}}{dt} = -\mathbf{A}(t) \boldsymbol{\mu} - \mathbf{b}(t) \implies \frac{d\boldsymbol{\mu}}{dt} = \mathbf{A}(t) \boldsymbol{\mu} + \mathbf{b}(t)\) Matches forward. For variance: \(-\frac{d\boldsymbol{\Sigma}}{dt} = -2\mathbf{A}(t) \boldsymbol{\Sigma} - g(t)^2 \mathbf{I} \implies \frac{d\boldsymbol{\Sigma}}{dt} = \mathbf{A}(t) \boldsymbol{\Sigma} + \boldsymbol{\Sigma} \mathbf{A}^\top(t) + g(t)^2 \mathbf{I}\) Matches forward. The factor of 3 disappears with correct time reversal.
Conclusion
- Score Function: $\nabla_{\mathbf{x}} \log p_t(\mathbf{x}) = -\boldsymbol{\Sigma}^{-1}(t) (\mathbf{x} - \boldsymbol{\mu}(t))$ (under Gaussian assumption).
- Reverse SDE: Linear drift grouped as: \(d\mathbf{x} = \left[ \left( \mathbf{A}(t) + g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \right) \mathbf{x} + \mathbf{b}(t) - g(t)^2 \boldsymbol{\Sigma}^{-1}(t) \boldsymbol{\mu}(t) \right] dt + g(t) d\bar{\mathbf{w}}\)
- Mean and Variance: Solved from forward ODEs: \(\frac{d\boldsymbol{\mu}}{dt} = \mathbf{A}(t) \boldsymbol{\mu} + \mathbf{b}(t), \quad \frac{d\boldsymbol{\Sigma}}{dt} = \mathbf{A}(t) \boldsymbol{\Sigma} + \boldsymbol{\Sigma} \mathbf{A}^\top(t) + g(t)^2 \mathbf{I}\) with initial conditions $\boldsymbol{\mu}(0), \boldsymbol{\Sigma}(0)$. The reverse process uses the same $\boldsymbol{\mu}(t), \boldsymbol{\Sigma}(t)$ via time reparameterization. This proves consistency without assuming forward-reverse symmetry.