Training Flow and Diffusion Models
These are some notes I took, while watching MIT 6.S184: Lecture 03 and reading [1].
1. Overview
Flow Matching [1]: is a method for learning a Continuous Normalizing Flow that transforms samples between any two distributions. It shows that Diffusion Models are actually just a specific type of flow matching where the probability path is defined by a Gaussian noise schedule. 11 Flow Matching + Straight Conditional Probability Path (Schedule) = Optimal Transport Path (most popular, due to ease of obtaining training examples)
Flow Matching + Curved Conditional Probaibility Path (Diffusion Schedule, i.e. gaussian noise) = Diffusion Model (trained with a better objective).
So to summarize, key two features of flow matching are 1) arbitaray priors, 2) flexible paths:
- Arbitrary Priors: While most generative models like statndard Diffusion require a Gaussian prior, Flow Matching is more flexible. In practice, a Gaussian is frequently chosen in flow matching as well.
- Flexible Paths: it can use a variety of probability paths to connect the prior and the data, such as Optimal Transport (OT) paths which create trajectories for faster sampling (bigger step size). 22 While the learned flow is not perfectly straight (due to path crossings), it is straighter than diffusion paths, allowing for generation in fewer steps (e.g., 10-20 instead of 50+). Techniques like ‘Reflow’ [2] can be added to straighten the paths further for 1-step generation.
2. Notation
To make the math easier to follow, here are the symbols used in this post:
| A sample point in the vector field (can be noise, data, or in-between). | |
| A sample from the data distribution (). | |
| A sample from the prior (noise) distribution (). | |
| Time step , where is noise and is data. | |
| The vector field (velocity) at location and time . | |
| Coefficients determining the path schedule | |
| The learnable parameters of the neural network. |
3. The big picture of arriving at a trainable objective in flow matching
-
Our ideal goal is to learn . So we can try to formulate the flow matching loss with the marginal vector field:
Yet the issue is that, being intractable33 because it requires an integral over the entire data distribution.
-
To fix this intractability, we use the Conditional Flow Matching (CFM) loss, which regresses against the easy-to-calculate conditional vector field .
It can be proved that the Marginal flow matching loss equals the conditional flow matching loss () up to a constant that does not depend on the neural network parameters ().
Because their gradients are the same (), we can minimize the easy, conditional loss to implicitly solve the hard, marginal one. -
Now that we know we can use the conditional version, we pick a specific
- For instance, we can use a Gaussian conditional probability path
-
So what’s the Target? For this path, we can analytically calculate exactly what should be using its time derivatives.
using the standard reparametrization trick 44 we express as a function of standard Gaussian noise : This allows us to substitute every instance of in the equation with
-
Finally, we choose the simplest “schedulers” for and .
The Setup: We set and .This specific linear interpolation gives us the Conditional Optimal Transport (CondOT) path [2].
The Loss: This leads to the simple training objective:where
To elaborate more
-
“first gen diffusion papers only did score matching”
- First diffusion models only used discrete time (no ODE, SDE)
- how different is the diffusion objective from the flow matching loss?
Is cosine schedule of diffusion equivalent to linear interpolation in FM?- Is Euler method used in practice? or do people prefer higher order methods? Heun’s method?
Bibliography
- [1] Y. Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, “Flow Matching for Generative Modeling,” in 11th International Conference on Learning Representations, 2023.
- [2] X. Liu, C. Gong, and Q. Liu, “Flow Straight And Fast: Learning To Generate And Transfer Data With Rectified Flow,” in 11th International Conference on Learning Representations, 2023.
MoveThanks!