幾個常用公式:
Conservation of probability:
\(\frac{d}{dt} \int p(x,t)\, dx = \int \frac{\partial p(x,t)}{\partial t}\, dx = 0\) \(\mathbb{E}_{p} \left[ \frac{\partial}{\partial t} \log p(x,t) \right] = \int p(x,t) \cdot \frac{\partial}{\partial t} \log p(x,t)\, dx = \int \frac{\partial p(x,t)}{\partial t}\, dx = 0\) Entropy 和時間變化: \(H(t) = - \mathbb{E}_p\left[\log p(x,t)\right] = -\int p(x,t)\log p(x,t)dx > 0\) \(\begin{aligned} \frac{d H(t)}{dt} &= -\int \frac{\partial p(x,t)}{\partial t}\log p(x,t)dx \\ &= \int \left[\nabla \cdot[\boldsymbol{u}(x,t)\, p(x,t)]-D(t) \Delta p(x,t)\right] \log p(x,t)dx \\ &= \underbrace{\mathbb{E}[\nabla \cdot \boldsymbol{u}(x,t)]}_{\text{Drift contribution}} + \underbrace{D(t) I(p)}_{\text{Diffusion contribution}}, \end{aligned}\) Fisher information $I(p) = \int p |\nabla \log p|^2 dx\, \ge 0$ .
Fokker-Planck 偏微分方程
在 generative AI 的 diffusion process 或是 flow method: 守恆量是機率 (任意時間點的機率和為 1) $\Phi = p(x, t)$,沒有 source $S=0$. 不過一般物理的 diffusion 是從高濃度向低濃度,但是 probability 卻是從低機率 diffuse 到高機率。可以等價視爲負 diffusion constant: $D = -\Gamma$ ,如果假設是 isotropic diffusion, $D(t)$ 和位置無關,可以和時間有關 (noise scheduling)。一般寫成: \(\frac{\partial p(x,t)}{\partial t}=-\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]+\nabla \cdot[D(t) \nabla p(x,t)] = -\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]+D(t) \Delta p(x,t)\) 另一個表示
\[\begin{aligned} \frac{\partial\log p(x,t)}{\partial t}&= \frac{1}{p(x,t)}\frac{\partial p(x,t)}{\partial t}\\ &= -\frac{\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]}{p(x,t)}+D(t) \frac{\Delta p(x,t)}{p(x,t)}\\ &= -\nabla \cdot \mathbf{u}(x,t)-\mathbf{u}(x,t)\cdot \nabla\log p(x,t) +D(t) \frac{\Delta p(x,t)}{p(x,t)}\\ &= -\mathbf{u}(x,t)\cdot \nabla\log p(x,t) \underbrace{- \nabla \cdot \mathbf{u}(x,t) +D(t)[\Delta \log p(x,t)] + \| \nabla \log p(x,t) \|^2]}_{=\frac{d\log p(x,t)}{d t}}\\ \end{aligned}\]Fokker-Planck 全微分方程 (不常用,容易錯 use with caution)
還有一個表示法是利用全微分 ODE,兩者等價。
- 偏微分:$p$ 是 $x,t$ 的函數。對應 (Euler) 靜止觀察者視角
- 全微分:$p$ 是 $x(t), t$ 的函數,最終都是 $t$ 的函數。對應 (Lagrange) 隨著粒子或流 $x_t$ 移動視角
偏微分和全微分的關係 \(\frac{d f(\mathbf{x},t)}{dt} = \frac{\partial f(\mathbf{x},t)}{\partial t} + \frac{d\mathbf{x}}{dt}\cdot \nabla f(\mathbf{x},t) = (\frac{\partial}{\partial t} + \frac{d\mathbf{x}}{dt}\cdot \nabla) f(\mathbf{x},t) = (\frac{\partial}{\partial t} + \mathbf{u}\cdot \nabla) f(\mathbf{x},t)\)
全微分可以把 $p(x,t)$ 改成 $\log p(x,t)$ ,見 Appendix A. \(\begin{aligned} \frac{d \log p(x,t)}{d t}&= -\nabla \cdot \mathbf{u}(x,t) + \nabla\cdot [D(t)\nabla \log p(x,t)] + D(t) \| \nabla \log p(x,t) \|^2\\ &= -\nabla \cdot \mathbf{u}(x,t) + D(t)[\Delta \log p(x,t)] + \| \nabla \log p(x,t) \|^2] \end{aligned}\) 偏微分和全微分的關係如下: \(\begin{aligned} \frac{d \log p(x,t)}{d t}&= \frac{\partial \log p(x,t)}{\partial t} + \mathbf{u}(x,t) \cdot \nabla\log p(x,t) \end{aligned}\) 另一個不常用的表示 \(\begin{aligned} \frac{d p(x,t)}{d t}&= \frac{\partial p(x,t)}{\partial t} + \mathbf{u}(x,t) \cdot \nabla p(x,t)\\ &=-\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]+\nabla \cdot[D(t) \nabla p(x,t)] + \mathbf{u}(x,t) \cdot \nabla p(x,t)\\ &= -\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]+D(t) \Delta p(x,t) + \mathbf{u}(x,t) \cdot \nabla p(x,t)\\ &= -[\nabla \cdot \mathbf{u}(x,t)]\, p(x,t)+D(t) \Delta p(x,t) \end{aligned}\)
[!Special Case: No Diffusion, Flow Only]
對於 flow only, no diffusion, $D = 0$, 可以簡化成:
\(\frac{\partial p(x,t)}{\partial t}=-\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)] \quad \text{ or }\frac{d p(x,t)}{d t}=-[\nabla \cdot \mathbf{u}(x,t)]\, p(x,t)\) \(\frac{d \log p(x,t)}{d t}= -\nabla \cdot \mathbf{u}(x,t) \quad \text{ or }\frac{\partial \log p(x,t)}{\partial t}=- \nabla \cdot \mathbf{u}(x,t)-\mathbf{u}(x,t)\cdot \nabla\log p(x,t)\)
微觀 sample $x_t$ “全微分” SDE
以上是以 ODE 的形式,代表(巨觀)平均 flow $p(x,t)$ 的形式。另一個則是 SDE,代表(微觀)個別樣本隨機運動 $x_t$ 的形式。個別 sample 一定是 $t$ 的函數,因此是全微分表示法。
最 general 的形式如下。這裏的 $\sigma(x_t, t)$ 是 “incremental” additive noise,我們之後會改成 $g(t)$. 因爲很容易和後面的 “total” additive Gaussian noise $\boldsymbol{x}_t = \boldsymbol{x}_0 + \sigma(t) \boldsymbol{z}_t$ 混淆! \(d \boldsymbol{x}_t = \mathbf{u}(\boldsymbol{x}_t, t) d t+\sigma(\boldsymbol{x}_t, t) d \boldsymbol{w}_t\) 對應的 Fokker-Planck \(\frac{\partial p(x,t)}{\partial t}= -\nabla \cdot[\mathbf{u}(x,t)\, p(x,t)]+ \Delta [D(x, t) p(x,t)]\quad \text{ where }D(x,t) = \frac{\sigma^2(x,t)}{2}\) 我們改成 isotropic 形式, 把 $\sigma(\boldsymbol{x}_t, t)$ 改成 $g(t)$,$\boldsymbol{u}(\boldsymbol{x}_t,t)$ 變成 $\boldsymbol{f}(\boldsymbol{x}_t,t)$. \(d \boldsymbol{x}_t = \boldsymbol{f}(\boldsymbol{x}_t, t) d t+g(t) d \boldsymbol{w}_t\quad \text{ where }D(t) = \frac{g^2(t)}{2}\) 對應的偏微分 PDE: \(\frac{\partial p(x,t)}{\partial t}=-\nabla \cdot[\boldsymbol{f}(x,t)\, p(x,t)]+\frac{g^2(t)}{2} \Delta p(x,t)\) 全微分 ODE: \(\frac{d \log p(x,t)}{d t}=-\nabla \cdot\boldsymbol{f}(x,t)+\frac{g^2(t)}{2} [ \Delta \log p(x,t) + \|\nabla\log p(x,t)\|^2]\)
Conservation of Probability
Appendix P 證明 Fokker-Planck equation probability conservation (sum to 1). 注意都是偏微分 “靜止觀察者的 global view”, 不是全微分 “粒子移動的 view”
\(\frac{d}{dt} \int p(x,t)\, dx = \int \frac{\partial p(x,t)}{\partial t}\, dx = 0\) 證明是 drift 和 diffusion 的時間微分,分別為 0。代表各自影響的 probability conservation? 物理意義?應該不是說 40% probability 是 drift, 60% probability 是 diffusion 各自 conservation. 比較好的類比是把墨水滴在水中,同時攪動水產生 flow field。總墨水量不變 (總 probability conservation)。而且 diffusion 散開的墨水量和攪動 drift 散開的墨水量不變? 還是其實沒有意義,因為 drift and diffusion 的 probability conservation 都是假設無窮大範圍的面積分為 0.
\[\int \nabla \cdot (\boldsymbol{u} p)\, dx = 0\] \[\int \Delta p\, dx = \int \nabla \cdot (\nabla p)\, dx = \oint_{\partial \Omega} \nabla p \cdot \hat{n}\, dS = 0\]注意全微分 “粒子移動的 view”. 如果我是粒子而且在一個壓縮的 flow field ($\nabla\cdot u<0$),整體積分會得到變多的粒子?
\(\begin{aligned} \int\frac{d p(x,t)}{d t} dx&= \underbrace{\int \frac{\partial p(x,t)}{\partial t} dx}_{=0}+ \int \mathbf{u}(x,t) \cdot \nabla p(x,t)dx\\ &= -\int[\nabla \cdot \mathbf{u}(x,t)]\, p(x,t) dx \ne 0 \end{aligned}\) 也就是說 \(\frac{d}{dt} \int p(x,t)\, dx = \int \frac{\partial p(x,t)}{\partial t}\, dx \ne \int \frac{d p(x,t)}{d t}\, dx\)
完全等價的 conservation of probability:(Appendix Q)
\(\mathbb{E}_{p} \left[ \frac{\partial}{\partial t} \log p(x,t) \right] = \int p(x,t) \cdot \frac{\partial}{\partial t} \log p(x,t)\, dx = \int \frac{\partial p(x,t)}{\partial t}\, dx = 0\) 而不是 \(\frac{d}{dt} \log p(x(t), t) = \frac{\partial}{\partial t} \log p(x,t) + \nabla \log p(x,t) \cdot \frac{dx}{dt}\)
So the expectation becomes:
\[\mathbb{E}_p \left[ \frac{d}{dt} \log p(x(t),t) \right] = \underbrace{\mathbb{E} \left[ \frac{\partial}{\partial t} \log p(x,t) \right]}_{=0 \text{ prob. conservation} } + \mathbb{E} \left[ \nabla \log p(x,t) \cdot \frac{dx}{dt} \right]\] \[\mathbb{E}_p \left[ \frac{d}{dt} \log p(x(t),t) \right] = \mathbb{E}_p \left[ \boldsymbol{u}(x,t) \cdot \nabla \log p(x,t) \right] \ne 0\]Fokker-Planck 和 Entropy 的連結 (微觀熱力學)
我們看一下 entropy 的公式: \(H(t) = - \mathbb{E}_p\left[\log p(x,t)\right] = -\int p(x,t)\log p(x,t)dx > 0\) 重點是 entropy 隨時間變化。 注意 (ChatGPT, Appendix O): $\frac{dH}{dt}$ 是 global (Euler) view, 所以在和積分交換,是採用偏微分 (固定觀察者觀點) $\frac{\partial p}{\partial t}$ 而不是全微分 (移動粒子觀點) $\frac{dp}{dt}$. 這點我還是有一些疑問。
\[\begin{aligned} \frac{d H(t)}{dt} &= -\int \left[\frac{\partial p(x,t)}{\partial t}\log p(x,t) + p(x,t) \frac{1}{p(x,t)}\frac{\partial p(x,t)}{\partial t}\right]dx\\ &= -\int \frac{\partial p(x,t)}{\partial t}\log p(x,t)dx - \int \frac{\partial p(x,t)}{\partial t}dx\\ &= -\int \frac{\partial p(x,t)}{\partial t}\log p(x,t)dx - \frac{d}{d t}\underbrace{\int p(x,t) dx}_{=1}\\ &= -\int \frac{\partial p(x,t)}{\partial t}\log p(x,t)dx \\ &= \int \left[\nabla \cdot[\boldsymbol{f}(x,t)\, p(x,t)]-\frac{g^2(t)}{2} \Delta p(x,t)\right] \log p(x,t)dx \\ \end{aligned}\]The drift term 可以化簡成平均 drift divergence (Appendix M)
\[\int \nabla \cdot(\boldsymbol{f}(x,t)\, p(x,t)) \log p(x,t) \, dx = \int (\nabla \cdot \boldsymbol{f}(x,t))\, p(x,t) \, dx = \mathbb{E}_p [\nabla \cdot \boldsymbol{f}(x,t)]\]The diffusion term 可以化簡成
\[-\frac{g^2(t)}{2} \int \Delta p(x,t) \log p(x,t)\, dx = \frac{g^2(t)}{2} \int p(x,t) \left\|\nabla \log p(x,t)\right\|^2 dx\]Fisher information $I(p) = \int p |\nabla \log p|^2 dx\, \ge 0$ .
Combining both terms:
\[\frac{d H(t)}{dt} = \int \nabla \cdot \boldsymbol{f}(x,t)\, p(x,t)\, dx + \frac{g^2(t)}{2} \int p(x,t) \left\|\nabla \log p(x,t)\right\|^2 dx\] \[\frac{dH(t)}{dt} = \underbrace{\mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)]}_{\text{Drift contribution}} + \underbrace{\frac{g^2(t)}{2} I(p)}_{\text{Diffusion contribution}},\]where:
- $\mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)]$ is the expected divergence of the drift field $\boldsymbol{f}(x,t)$.
- $\frac{g^2(t)}{2} I(p)$ is the diffusion term, with $I(p) = \mathbb{E}\left[|\nabla \log p(x,t)|^2\right] \geq 0$ (Fisher information, always non-negative).
Entropy 隨時間變化
1. Diffusion term: $\frac{g^2(t)}{2} I(p)$
- Always non-negative, since $g^2(t) \ge 0$ and Fisher information $I(p) \ge 0$.
- Represents the entropy-increasing effect of diffusion (spreading out the distribution).
- Strictly positive unless $p$ is uniform or a Dirac delta (infinite entropy case).
2. Drift term: $\mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)]$
- Can be positive, negative, or zero, depending on the vector field $\boldsymbol{f}(x,t)$.
- If $\nabla \cdot \boldsymbol{f} < 0$ (e.g., a contracting flow), this term decreases entropy. 一般發生在 reverse diffusion (training and sampling).
- If $\nabla \cdot \boldsymbol{f} > 0$ (e.g., expanding flow), this term increases entropy. 一般發生在 forward diffusion (training).
Overall sign of $\frac{dH}{dt}$?
We cannot assert the sign of $\frac{dH}{dt}$ in general, because it depends on the balance between drift and diffusion:
- If diffusion dominates (large $g(t)$, or small $\nabla \cdot \boldsymbol{f}$), entropy increases: $\frac{dH}{dt} > 0$
- If drift dominates, and especially if it’s compressive: $\frac{dH}{dt} < 0$
- If they balance: $\frac{dH}{dt} = 0$, which can happen in stationary cases
Special case: Pure diffusion (no drift)
If $\boldsymbol{f}(x,t) = 0$, then:
\[\frac{dH(t)}{dt} = \frac{g^2(t)}{2} I(p) \ge 0\]So entropy always increases unless $I(p) = 0$, uniform distribution — this is consistent with the heat equation, where a peaked distribution spreads out over time.
Special case: Pure Gaussian diffusion (no drift)
\(p(x) = \frac{1}{\sqrt{2\pi \sigma^2}} \exp\left(-\frac{(x - \mu)^2}{2\sigma^2}\right)\) Then the squared norm:
\[\left(\frac{d}{dx} \log p(x)\right)^2 = \frac{(x - \mu)^2}{\sigma^4}\]Take the expectation under $p(x)$:
\(I(p) = \int p(x) \left( \frac{x - \mu}{\sigma^2} \right)^2 dx = \frac{1}{\sigma^4} \int p(x) (x - \mu)^2 dx = \frac{1}{\sigma^4} \cdot \sigma^2 = \boxed{\frac{1}{\sigma^2}}\) Gaussian distribution entropy \(H(p) = \frac{1}{2}\log(2\pi \sigma^2) + \frac{1}{2}\)
[!Multivariate version:]
If $p(x,t) = \mathcal{N}(\mu(t), \Sigma(t)) \in \mathbb{R}^d$, then:
\(I(p) = \int p(x)\, \|\nabla \log p(x)\|^2 dx = \operatorname{Tr}(\Sigma^{-1})\) because:
\[\nabla \log p(x) = -\Sigma^{-1}(x - \mu), \quad \text{so } \|\nabla \log p\|^2 = (x - \mu)^T \Sigma^{-2} (x - \mu)\]Then take expectation:
\(I(p) = \mathbb{E}[(x - \mu)^T \Sigma^{-2} (x - \mu)] = \operatorname{Tr}(\Sigma^{-2} \mathbb{E}[(x - \mu)(x - \mu)^T]) = \operatorname{Tr}(\Sigma^{-2} \Sigma) = \operatorname{Tr}(\Sigma^{-1})\) for isotropic diffusion $\Sigma(t) = \sigma^2(t) I$, then $I(p) = \frac{d}{\sigma^2(t)}$. 這裏的 $d$ 是 dimension, 不是微分符號!
Entropy 變化從 pure diffusion \(\frac{dH(t)}{dt} = \frac{g^2(t)}{2} I(p) = \frac{g^2(t)}{2\sigma^2(t)} = \frac{d\sigma^2(t)}{2 \sigma^2(t) dt} = \frac{1}{2}\frac{d\log\sigma^2(t)}{dt}\) 直接計算 entropy 變化 \(\frac{dH(t)}{dt} = \frac{1}{2}\frac{d \log (2\pi \sigma^2(t))}{dt} = \frac{1}{2}\frac{d\log\sigma^2(t)}{dt}\) 一致。
Special case: Divergence-Free (不可壓縮) Flow
If $\nabla\cdot \boldsymbol{f}(x,t) = 0$, then:
\[\frac{dH(t)}{dt} = \frac{g^2(t)}{2} I(p) \ge 0\]So entropy always increases unless $I(p) = 0$, uniform distribution — this is consistent with the heat equation, where a peaked distribution spreads out over time.
Special case: Deterministic flow (no diffusion)
If $g(t) = 0$, then:
\[\frac{dH(t)}{dt} = \mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)]\]This can be positive, negative, or 0 depending on whether the deterministic flow expands or contracts space.
Special Case: Gradient Flow
If the drift is a gradient of a potential $\boldsymbol{f}(x,t) = -\nabla V(x)$, then: \(\nabla \cdot \boldsymbol{f}(x,t) = -\Delta V(x),\) and the sign depends on the Laplacian of $V$:
- If $\Delta V(x) > 0$ (subharmonic potential), $\mathbb{E}[\nabla \cdot \boldsymbol{f}] < 0$, which can lead to entropy decrease.
- If $\Delta V(x) < 0$ (superharmonic potential), $\mathbb{E}[\nabla \cdot \boldsymbol{f}] > 0$, increasing entropy.
Conclusion
We cannot generally say whether $\frac{dH}{dt}$ is positive or negative without knowing more about the drift and diffusion. However:
- Diffusion always increases entropy
- Drift can increase or decrease entropy, depending on whether it compresses or expands probability mass.
This interplay is fundamental in stochastic processes and nonequilibrium thermodynamics.
Appendix M
We will continue the derivation from:
\[\frac{d H(t)}{dt} = \int \left[\nabla \cdot(\boldsymbol{f}(x,t)\, p(x,t)) - \frac{g^2(t)}{2} \Delta p(x,t)\right] \log p(x,t)\, dx\]We now handle the two terms in the integrand separately.
1. The drift term:
\[\int \nabla \cdot(\boldsymbol{f}(x,t)\, p(x,t)) \log p(x,t) \, dx\]Use integration by parts (divergence theorem) in reverse, assuming boundary terms vanish (e.g., decay at infinity):
\[\int \nabla \cdot(\boldsymbol{f}\, p) \log p \, dx = -\int \boldsymbol{f}(x,t)\, p(x,t) \cdot \nabla \log p(x,t) \, dx\]Using the identity $\nabla \log p = \frac{\nabla p}{p}$, we simplify:
\[= -\int \boldsymbol{f}(x,t) \cdot \nabla p(x,t) \, dx\]Now integrate by parts again, assuming boundary terms vanish:
\[= \int \nabla \cdot \boldsymbol{f}(x,t)\, p(x,t) \, dx\]2. The diffusion term:
\[-\frac{g^2(t)}{2} \int \Delta p(x,t) \log p(x,t)\, dx\]We use integration by parts, with the identity:
\[\int \Delta p \log p \, dx = -\int \frac{\|\nabla p\|^2}{p} \, dx = -\int p(x,t) \left\|\nabla \log p(x,t)\right\|^2 dx\]Hence, the diffusion contribution becomes:
\[\frac{g^2(t)}{2} \int p(x,t) \left\|\nabla \log p(x,t)\right\|^2 dx\]This is the Fisher information $I(p) = \int p |\nabla \log p|^2 dx$.
Final expression:
Combining both terms:
\[\frac{d H(t)}{dt} = \int \nabla \cdot \boldsymbol{f}(x,t)\, p(x,t)\, dx + \frac{g^2(t)}{2} \int p(x,t) \left\|\nabla \log p(x,t)\right\|^2 dx\]Or more compactly:
\[\boxed{ \frac{d H(t)}{dt} = \mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)] + \frac{g^2(t)}{2} I(p) }\]where:
- $\mathbb{E}[\nabla \cdot \boldsymbol{f}]$ is the expected divergence of the drift,
- $I(p)$ is the Fisher information of the distribution $p(x,t)$.
Appendix N
Great question. Let’s analyze the sign of:
\[\frac{dH(t)}{dt} = \mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)] + \frac{g^2(t)}{2} I(p)\]This tells us how the differential entropy $H(t) = -\int p(x,t) \log p(x,t) dx$ evolves over time, for a probability distribution $p(x,t)$ governed by a Fokker–Planck equation with drift $\boldsymbol{f}(x,t)$ and scalar diffusion coefficient $g(t)$.
Term-by-term analysis
1. Diffusion term: $\frac{g^2(t)}{2} I(p)$
- Always non-negative, since $g^2(t) \ge 0$ and Fisher information $I(p) \ge 0$.
- Represents the entropy-increasing effect of diffusion (spreading out the distribution).
- Strictly positive unless $p$ is uniform or a Dirac delta (infinite entropy case).
2. Drift term: $\mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)]$
- Can be positive, negative, or zero, depending on the vector field $\boldsymbol{f}(x,t)$.
- If $\nabla \cdot \boldsymbol{f} < 0$ (e.g., a contracting flow), this term decreases entropy.
- If $\nabla \cdot \boldsymbol{f} > 0$ (e.g., expanding flow), this term increases entropy.
Overall sign of $\frac{dH}{dt}$?
We cannot assert the sign of $\frac{dH}{dt}$ in general, because it depends on the balance between drift and diffusion:
- If diffusion dominates (large $g(t)$, or small $\nabla \cdot \boldsymbol{f}$), entropy increases: $\frac{dH}{dt} > 0$
- If drift dominates, and especially if it’s compressive: $\frac{dH}{dt} < 0$
- If they balance: $\frac{dH}{dt} = 0$, which can happen in stationary cases
Special case: Pure diffusion (no drift)
If $\boldsymbol{f}(x,t) = 0$, then:
\[\frac{dH(t)}{dt} = \frac{g^2(t)}{2} I(p) \ge 0\]So entropy always increases — this is consistent with the heat equation, where a peaked distribution spreads out over time.
Special case: Deterministic flow (no diffusion)
If $g(t) = 0$, then:
\[\frac{dH(t)}{dt} = \mathbb{E}[\nabla \cdot \boldsymbol{f}(x,t)]\]This can be positive or negative depending on whether the deterministic flow expands or contracts space.
Conclusion
We cannot generally say whether $\frac{dH}{dt}$ is positive or negative without knowing more about the drift and diffusion. However:
- Diffusion always increases entropy
- Drift can increase or decrease entropy, depending on whether it compresses or expands probability mass.
This interplay is fundamental in stochastic processes and nonequilibrium thermodynamics.
Appendix O
Great question! The correct expression for the time derivative of Shannon entropy is:
\[\boxed{ \frac{dH(t)}{dt} = -\int \frac{\partial p(x,t)}{\partial t} \log p(x,t)\, dx }\]Why not the material derivative $\frac{d p}{dt}$?
The material derivative $\frac{d p}{dt} = \frac{\partial p}{\partial t} + \mathbf{u} \cdot \nabla p$ describes how $p$ changes along a moving particle’s path in a flow field $\mathbf{u}(x,t)$. It’s used in fluid mechanics or stochastic processes when tracking individual trajectories.
However, entropy is a global functional of the probability density:
\[H(t) = -\int p(x,t) \log p(x,t) \, dx\]So when you take its time derivative, you apply the chain rule to the integrand:
\[\frac{dH}{dt} = -\int \left( \frac{\partial p}{\partial t} \log p + \frac{\partial p}{\partial t} \right) dx\]But since $\int \frac{\partial p}{\partial t} dx = \frac{d}{dt} \int p(x,t) dx = 0$, that second term vanishes.
So you’re left with:
\[\frac{dH}{dt} = -\int \frac{\partial p(x,t)}{\partial t} \log p(x,t) \, dx\]✅ Therefore:
Use the partial derivative $\partial p / \partial t$, not the material derivative $d p / dt$.
檢查如果用全微分錯誤的答案
\(\begin{aligned} \frac{d H(t)}{dt} &= -\int \left[\frac{d p(x,t)}{d t}\log p(x,t) + p(x,t) \frac{1}{p(x,t)}\frac{d p(x,t)}{d t}\right]dx\\ &= -\int \frac{d p(x,t)}{d t}\log p(x,t)dx - \int \frac{d p(x,t)}{d t}dx\\ &= -\int \frac{d p(x,t)}{d t}\log p(x,t)dx - \frac{d}{d t}\underbrace{\int p(x,t) dx}_{=1}\\ &= -\int \frac{d p(x,t)}{d t}\log p(x,t)dx \\ &= \int \left[\nabla \cdot[\boldsymbol{f}(x,t)]\, p(x,t)-\frac{g^2(t)}{2} \Delta p(x,t)\right] \log p(x,t)dx \\ &= \underbrace{\int\nabla \cdot[\boldsymbol{f}(x,t)]\, p(x,t)\log p(x,t)}_{\text{no simplification}} + \underbrace{\frac{g^2(t)}{2} I(p)}_{\text{Diffusion contribution}}\\ &\ne \frac{\partial H}{\partial t} \end{aligned}\)
Appendix P
To prove conservation of probability, you want to show that the total probability mass stays constant over time — i.e.,
\[\boxed{ \frac{d}{dt} \int p(x,t)\, dx = 0 }\]You are given the Fokker–Planck equation (also called the forward Kolmogorov equation):
\[\frac{\partial p(x,t)}{\partial t} = -\nabla \cdot \left[ \boldsymbol{f}(x,t)\, p(x,t) \right] + \frac{g^2(t)}{2} \Delta p(x,t)\]✅ Step-by-step Proof
We start by computing the time derivative of the total probability mass:
\[\frac{d}{dt} \int p(x,t)\, dx = \int \frac{\partial p(x,t)}{\partial t}\, dx\]Now substitute the right-hand side of the Fokker–Planck equation:
\[\int \left[ -\nabla \cdot (\boldsymbol{f} p) + \frac{g^2(t)}{2} \Delta p \right] dx\]Break this into two terms:
\[- \int \nabla \cdot (\boldsymbol{f} p)\, dx + \frac{g^2(t)}{2} \int \Delta p\, dx\]We now evaluate each term using the divergence theorem (or integration by parts):
🧮 Term 1: $\int \nabla \cdot (\boldsymbol{f} p)\, dx$
By the divergence theorem:
\[\int \nabla \cdot (\boldsymbol{f} p)\, dx = \oint_{\partial \Omega} (\boldsymbol{f} p) \cdot \hat{n}\, dS\]If we assume that $p(x,t) \to 0$ fast enough as $|x| \to \infty$ (e.g., for Gaussian-like distributions), then the surface integral at infinity vanishes:
\[\boxed{ \int \nabla \cdot (\boldsymbol{f} p)\, dx = 0 }\]🧮 Term 2: $\int \Delta p\, dx$
Again using integration by parts or the divergence theorem:
\[\int \Delta p\, dx = \int \nabla \cdot (\nabla p)\, dx = \oint_{\partial \Omega} \nabla p \cdot \hat{n}\, dS\]This also vanishes under the assumption that $\nabla p \to 0$ at the boundary (i.e., $p$ is smooth and decays rapidly):
\[\boxed{ \int \Delta p\, dx = 0 }\]✅ Final Result:
Putting both terms together:
\[\frac{d}{dt} \int p(x,t)\, dx = 0\]This proves conservation of probability mass.
📌 Assumptions:
This holds under mild regularity assumptions, such as:
- $p(x,t)$ and $\boldsymbol{f}(x,t)$ are smooth,
- $p(x,t) \to 0$ and $\nabla p(x,t) \to 0$ fast enough as $|x| \to \infty$ (e.g., exponential or Gaussian decay),
- No sources or sinks (mass creation or destruction).
These are standard in Fokker–Planck settings.
Let me know if you want this done for bounded domains with boundary conditions too.