Model Benchmarks

Real outputs instead of synthetic eval scores. Each benchmark gives every model the same prompts through the same pipeline, then shows you everything that came out, including the failures.

AI Video Generation

17 LLMs each generate 7 Remotion motion-design videos from identical prompts via FrameCall. The lineup includes GPT-5.6, Claude, Gemini 3.1 Pro, DeepSeek V4 and Kimi K2.6.

Watch the results

More benchmarks coming. Have an idea for a model face-off? Tell us.