Building an Automated Video Generation Pipeline with Claude Code

The pipeline that emerged

Aliaksei Zelianouski developed ‘Simona,’ a customized Claude Code setup capable of building its own toolset for video production. The entire creative effort cost $45.26 across multiple iterations and assets.

Why This Matters

While AI agents are often marketed as seamless, the technical reality involves managing ‘blind’ systems that cannot see their own visual output, leading to timing drifts and synchronization errors. The process demonstrates that high-fidelity output requires a hybrid approach: combining expensive generative AI for hero moments with cheap, deterministic ffmpeg scripts for the bulk of the runtime to manage costs and maintain control.

Key Insights

Cost-driven creative decisions: Total spend was $45.26 (2026), where higher costs per clip forced a pivot from hyperrealistic images to cheaper chalkboard styles.
Skill accretion: The system uses ‘skills’ (small Python CLI wrappers with SKILL.md documentation) to freeze successful API paths and avoid rediscovery.
Deterministic editing over eyeballing: Due to LLM blindness, the pipeline relies on precise written editing patterns in ffmpeg rather than visual feedback loops.
Reference-to-video consistency: Using reference images and voice samples across models like Seedance 2.0 ensures character and narrator consistency between static and motion clips.

Working Examples

Implementation of the Ken Burns effect (slow zoom) rendered at 4K and downscaled with lanczos to prevent jitter.

ffmpeg -i doors.png -vf "zoompan=z='1+(1.4-1)*on/(frames-1)':d=100:\nx='iw/2-iw/zoom/2':y='ih/2-ih/zoom/2':s=3840x2160:fps=25,\nscale=1920:1080:flags=lanczos" -frames:v 100 scene.mp4

Mixing narration over ambient sound using adelay for timestamps and normalize=0 to prevent volume attenuation.

ffmpeg ... -filter_complex \
"[1:a]adelay=300|300[a1];[2:a]adelay=4500|4500[a2];[3:a]adelay=10000|10000[a3];\
[0:a][a1][a2][a3]amix=inputs=4:duration=first:normalize=0[out]" ...

Practical Applications

- Use case: Automated content creation using a modular ‘skill’ library (Simona) to wrap various AI APIs into reproducible CLI tools.
Pitfall: Granting agents unrestricted git access; a misdirected commit wiped two months of untracked assets.
- Use case: Seamless transitions between static zooms and AI video by outpainting a frame and compositing the original back via ffmpeg overlay with feathered edges.
Pitfall: Relying on generative AI for realistic human faces in Seedance 2.t, which triggers content guardrails.

References:

https://dev.to/hiper2d/my-video-generation-pipeline-that-built-itself-459n

On This Page

The pipeline that emerged

Why This Matters

Key Insights

Working Examples

Practical Applications

Continue reading

Related Content

Deploying Jina Serve: Neural Search and AI Serving on Ubuntu 24.04

Building Real-Time Financial AI Agents with MCP and Claude

Building Browser-Local AI: A Next.js Architecture with WebLLM and Web Workers