Pulse of Motion

Pulse of Motion Benchmark

Measuring temporal realism in AI video generation. Does your model produce motion that matches real-world physics? The PoM benchmark evaluates this using PhyFPS — a metric that predicts frame rate directly from visual dynamics, without reading metadata.

What We Measure

PhyFPS (Physical Frames Per Second) captures how closely AI-generated video motion matches the temporal dynamics of the real world. A model with low PhyFPS error produces videos where objects move at physically plausible speeds.

For details, refer to Pulse of Motion

Δ

Avg. Error

Mean absolute difference between predicted PhyFPS and container meta FPS across all clips.

Avg. Error=1Vv=1V1Cvc=1Cvf^v,cFmeta,c\text{Avg. Error} = \frac{1}{V}\sum_{v=1}^{V} \frac{1}{C_v}\sum_{c=1}^{C_v}\left|\hat{f}_{v,c} - F_{\text{meta},c}\right|
%

Pct. Error

Percentage error normalized by meta FPS, enabling cross-comparison across frame rate ranges.

Pct. Error=100Vv=1V1Cvc=1Cvf^v,cFmeta,cFmeta,c\text{Pct. Error} = \frac{100}{V}\sum_{v=1}^{V} \frac{1}{C_v}\sum_{c=1}^{C_v}\frac{\left|\hat{f}_{v,c} - F_{\text{meta},c}\right|}{F_{\text{meta},c}}
σ

Intra-Video CV

Coefficient of variation across sliding-window clips within each video. Measures temporal consistency.

Intra CV=1Vv=1VStd({f^v,c})Mean({f^v,c})\text{Intra CV} = \frac{1}{V}\sum_{v=1}^{V} \frac{\text{Std}\left(\{\hat{f}_{v,c}\}\right)}{\text{Mean}\left(\{\hat{f}_{v,c}\}\right)}

Text-Video Alignment

CLIP-based cosine similarity between input text prompt and generated video. A supplementary metric — not a primary evaluation dimension.

Note: Submissions below 0.16 may lack meaningful text-video alignment.

Dynamic FPS detection: Our pipeline automatically reads each video's per-frame timestamps and computes per-clip meta FPS, supporting both constant and variable frame rate videos.

How It Works

Four simple steps to benchmark your video generation model.