What is ViMax?
ViMax is an open-source multi-agent video generation framework from HKUDS (The University of Hong Kong) that automates the entire video creation pipeline β from a raw idea to a finished, minute-long video with consistent characters, professional storyboarding, and synchronized audio.
Unlike most AI video tools that generate only a few seconds of footage, ViMax orchestrates 12 specialized agents that collaborate like a real film production team: a Screenwriter, Storyboard Artist, Character Extractor, Reference Image Selector, Best Image Selector, and more β all coordinated by a central orchestration layer.
The Problem ViMax Solves
Current AI video generation tools suffer from critical limitations:
- Short clip syndrome β Most tools produce only 2-5 second clips, not minute-long or hour-long content
- Consistency chaos β Characters and scenes change unpredictably across frames
- Visual-only focus β Missing scripts, audio, narrative structure, and storytelling depth
- Manual production bottlenecks β Reference image acquisition, consistency checking, storyboard design, and shot planning all require human expertise
ViMax eliminates these bottlenecks by automating the entire pipeline from narrative input to final video output.
Four Creative Modes
ViMax offers four distinct creation modes, each powered by the same multi-agent engine:
1. Idea2Video β From Spark to Screen
Transform raw ideas into complete video stories. Simply describe your concept and ViMax autonomously handles scriptwriting, storyboarding, character creation, and final video generation β all end-to-end.
idea = """
If a cat and a dog are best friends, what would happen when they meet a new cat?
"""
user_requirement = """
For children, do not exceed 3 scenes.
"""
style = "Cartoon"
2. Novel2Video β Smart Literary Adaptation
Transform complete novels into episodic video content with intelligent narrative compression, character tracking, and scene-by-scene visual adaptation. The RAG-based long script design engine analyzes lengthy stories and automatically segments them into multi-scene scripts while retaining key plot developments.
3. Script2Video β Unlimited Screenplay Creation
Write any screenplay β from personal stories to epic adventures β and ViMax brings it to life with complete control over every aspect of visual storytelling.
script = """
EXT. SCHOOL GYM - DAY
A group of students are practicing basketball in the gym...
John: (dribbling the ball) I'm going to score a basket!
Jane: (smiling) Good job, John!
"""
user_requirement = "Fast-paced with no more than 20 shots."
style = "Animate Style"
4. AutoCameo β Interactive Video from Your Photo
Upload your photo and ViMax intelligently integrates you as a character with consistent appearance and natural interactions throughout the entire video. Create your own cameo across limitless creative scripts.
12 Specialized Agents
The power of ViMax lies in its agent architecture. Each agent handles a specific production role:
| Agent | Role |
|---|---|
| Screenwriter | Generates scripts from ideas and requirements |
| Script Planner | Structures narrative flow and scene boundaries |
| Script Enhancer | Refines scripts for visual storytelling |
| Novel Compressor | Compresses long novels into segmented scripts |
| Character Extractor | Extracts character descriptions and traits |
| Scene Extractor | Identifies scene boundaries and environments |
| Event Extractor | Maps key events and plot points |
| Character Portraits Generator | Creates visual character references |
| Storyboard Artist | Designs shot-level storyboards with cinematography language |
| Camera Image Generator | Simulates multi-camera filming angles |
| Reference Image Selector | Intelligently selects reference images for first-frame accuracy |
| Best Image Selector | Parallel image generation with MLLM/VLM consistency check |
| Global Information Planner | Manages cross-scene continuity and temporal coherence |
Technical Capabilities
RAG-Based Long Script Generation
The script design engine intelligently analyzes lengthy, novel-like stories and automatically segments them into multi-scene script format, meticulously ensuring all key plot developments and character dialogues are accurately retained.
Expressive Storyboard Design
Shot-level storyboard system creates expressive storyboards through cinematography language based on user requirements and target audiences, establishing narrative rhythm for subsequent video generation.
Multi-Camera Filming Simulation
Simulates multi-camera filming to deliver an immersive viewing experience while maintaining consistent character positioning and backgrounds within the same scene.
Intelligent Reference Image Selection
Intelligently selects the reference image required for the first frame of the current video, including storyboards from the previous timeline, ensuring accuracy of multiple characters and environmental elements as the video becomes longer.
Automated Consistency Check
Generates multiple images in parallel and selects the best consistent image as the first frame through MLLM/VLM β imitating the workflow of human creators who review and select the best takes.
High-Efficiency Parallel Shot Generation
Parallel processing for sequential shots captured from the same camera enables highly efficient video production.
Three Pipeline Architectures
ViMax ships with three pre-built pipelines:
| Pipeline | Entry Point | Input | Use Case |
|---|---|---|---|
| Idea2Video | main_idea2video.py | Natural language idea | Quick creative exploration |
| Script2Video | main_script2video.py | Screenplay/script | Controlled narrative production |
| Novel2Movie | novel2movie_pipeline.py | Full novel text | Long-form literary adaptation |
Supported Backends
ViMax supports multiple generation backends:
- Image Generation: Google Nanobanana API, Yunwu API, Doubao Seedream
- Video Generation: Google Veo API, Yunwu API, Doubao Seedance
- Chat Models: OpenAI-compatible (via OpenRouter), MiniMax (M2.7 with 1M context, M2.5 with 204K context)
- Reranking: BGE via Silicon API
Quick Start
# Clone and install
git clone https://github.com/HKUDS/ViMax.git
cd ViMax
uv sync
# Configure API keys in configs/idea2video.yaml
Configure your model and API keys in configs/idea2video.yaml:
chat_model:
init_args:
model: google/gemini-2.5-flash-lite-preview-09-2025
model_provider: openai
api_key: <YOUR_API_KEY>
base_url: https://openrouter.ai/api/v1
image_generator:
class_path: tools.ImageGeneratorNanobananaGoogleAPI
init_args:
api_key: <YOUR_API_KEY>
video_generator:
class_path: tools.VideoGeneratorVeoGoogleAPI
init_args:
api_key: <YOUR_API_KEY>
Then run:
python main_idea2video.py
Why ViMax Matters
ViMax represents a paradigm shift in AI video generation β moving from single-shot clip generators to full production pipelines. By decomposing video creation into specialized agent roles (just like a real film crew), ViMax achieves:
- Long-form consistency β Characters and environments remain stable across hundreds of shots
- Creative freedom β From any narrative format (idea, script, novel) to finished video
- Professional quality β Automated storyboarding, shot design, and consistency validation
- Production efficiency β Parallel processing and intelligent resource management
For developers and creators looking to build AI-powered video production workflows, ViMax provides the most complete open-source multi-agent framework available today.
Repository: github.com/HKUDS/ViMax Enjoyed this post? Never miss out on future posts by following us
Stars: 5.7K+ | License: MIT | Language: Python 3.12