M
MesmerTools
All benchmarks

AI Video Generation Benchmark

11 LLMs each turned the same 4 prompts into Remotion motion-design videos through FrameCall's production pipeline. Every model got the same brief and went through the same renderer. What came out is what you see here, failures included.

Claude Fable 5 in action
Tap any clip to watch it full size

Three of the four briefs, generated by Claude Fable 5, the newest model in the lineup, with zero hand edits. Keep scrolling to watch all 11 models attempt the same videos side by side.

Methodology
Each model received an identical brief for every video: the same instruction set, the same user prompt, and the same canvas specs, with nothing tailored per model. The code each one produced went straight into the same compile and render step, with no hand fixes and no error-correction loop in between. When a provider dropped a connection or returned nothing usable we retried and kept count, which is what the Tries column shows. A failed card is the model's own failure: it either produced no usable code or its code crashed during rendering. The Claude models ran through Claude Code instead of the API, so their generation times are left out and their token counts and costs are estimates based on response sizes priced at OpenRouter's Anthropic rates.

Leaderboard

The order comes from watching all four videos from every model and judging the output quality by hand.

Swipe the table sideways for all columns

#ModelRan via Videos rendered Tries Avg gen time Avg code size Total cost
1
Gemini 3.1 Pro
Google
OpenRouter4/413110.0s16.3k$0.800
2
Claude Fable 5
Anthropic
Claude Code4/4423.8k~$2.055
3
Claude Opus 4.8
Anthropic
Claude Code4/4425.9k~$1.129
4
GPT-5.5
OpenAI
OpenRouter4/44227.3s28.4k$1.568
5
Gemini 3.5 Flash
Google
OpenRouter4/4563.3s30.0k$0.525
6
Claude Sonnet 4.6
Anthropic
Claude Code4/4421.2k~$0.574
7
Kimi K2.6
Moonshot AI
OpenRouter4/415378.5s18.1k$0.292
8
GLM 5.1
Z.ai
OpenRouter4/48263.8s26.5k$0.302
9
MiMo V2.5 Pro
Xiaomi
OpenRouter3/421359.5s25.2k$0.023
10
DeepSeek V4 Pro
DeepSeek
OpenRouter3/45142.4s18.0k$0.062
11
DeepSeek V4 Flash
DeepSeek
OpenRouter4/45117.1s29.9k$0.019

1. YouTube Channel Intro

1920×1080 · 16:9 · 30fps
Show the exact prompt every model received
Create a polished 12-second YouTube channel intro for a tech review channel called "TECHPULSE".

Include: logo reveal (a "TP" monogram), channel name animation, tagline "honest reviews • weekly uploads", social handles (@techpulse, /techpulse, techpulse.com), and a subscribe CTA with bell icon.

Should look like something a channel with 500k+ subs would use. Professional but approachable vibe. Dark theme with blue accents.

2. SaaS Product Launch Teaser

1080×1080 · 1:1 · 30fps
Show the exact prompt every model received
Create a hype 12-second product launch teaser for a SaaS app called "FlowBoard" - a team productivity tool.

Show off the product name with dramatic typography (gradient text looks good), highlight 3 key features: "Real-time Collaboration", "AI Task Prioritization", "Team Analytics". Include an app mockup, tagline "Built for teams that ship", waitlist CTA, and the URL flowboard.io.

Modern startup aesthetic, square format for social. Make it feel like a Product Hunt launch video.

3. Instagram Story Ad

1080×1920 · 9:16 · 30fps
Show the exact prompt every model received
Create a 15-second Instagram Story ad for wireless earbuds called "SoundPods Pro".

Product showcase with: product visual (earbud silhouette), name, price ($79, crossed out $149 showing discount), features (40hr battery, ANC, wireless charging), social proof (5-star rating, "2,847 reviews", customer quote), and Shop Now CTA with swipe-up prompt.

Should feel premium like an Apple or Samsung ad. Clean light background, product is the hero. Vertical story format, readable text at mobile size.
💥No usable code returnedEmpty content. finish_reason=length
#9MiMo V2.5 ProOpenRouter
Xiaomi1125.6s· 12 tries

4. App Feature Walkthrough

1920×1080 · 16:9 · 30fps
Show the exact prompt every model received
Create a 15-second feature walkthrough for a task app's "Smart Scheduling" feature.

Show the user flow: phone mockup with task list, adding a new task "Finish quarterly report", toggling on "Smart Schedule" (with AI sparkle), watching it auto-slot into the calendar, getting a reminder notification. End with tagline "Smart Scheduling — Let AI handle the when".

Should look like a real landing page demo video. Clean modern UI, satisfying interactions, professional feel.
💥Code didn't compileMalformed path data: M was expected to have numbers afterwards
#10DeepSeek V4 ProOpenRouter
DeepSeek157.1s27.1k chars· 2 tries

Bonus: recreating a real music video

960×720 · 4:3 · 30fps · 21s

Off the leaderboard, we pushed the top Claude model further: recreate the opening of a real lyric video — “...Baby One More Time” — shot for shot, as a single Remotion composition. Claude Fable 5 designed an original mascot, synced every lyric to the word it lands on, and rebuilt the scene cuts, whip pans, and camera punch-ins from a written spec of the original.

“...Baby One More Time” — OpenMotion mascot cut
Claude Fable 5 · word-level lyric sync · zero hand edits
Made with FrameCall

Want videos like these for your own product?

Every one of the 42 videos on this page came out of FrameCall. Type a prompt, pick a model, and get a real motion-design video you can edit, export, and ship. No timeline editor and no After Effects required.

Prompt inVideo out42 proofs above
Try FrameCall
11 models × 4 prompts = 44 video slots, all rendered with FrameCall's export pipeline.