What is ViMax?

ViMax is an open-source multi-agent video generation framework from HKUDS (The University of Hong Kong) that automates the entire video creation pipeline β€” from a raw idea to a finished, minute-long video with consistent characters, professional storyboarding, and synchronized audio.

Unlike most AI video tools that generate only a few seconds of footage, ViMax orchestrates 12 specialized agents that collaborate like a real film production team: a Screenwriter, Storyboard Artist, Character Extractor, Reference Image Selector, Best Image Selector, and more β€” all coordinated by a central orchestration layer.

ViMax Architecture

The Problem ViMax Solves

Current AI video generation tools suffer from critical limitations:

  • Short clip syndrome β€” Most tools produce only 2-5 second clips, not minute-long or hour-long content
  • Consistency chaos β€” Characters and scenes change unpredictably across frames
  • Visual-only focus β€” Missing scripts, audio, narrative structure, and storytelling depth
  • Manual production bottlenecks β€” Reference image acquisition, consistency checking, storyboard design, and shot planning all require human expertise

ViMax eliminates these bottlenecks by automating the entire pipeline from narrative input to final video output.

Four Creative Modes

ViMax offers four distinct creation modes, each powered by the same multi-agent engine:

1. Idea2Video β€” From Spark to Screen

Transform raw ideas into complete video stories. Simply describe your concept and ViMax autonomously handles scriptwriting, storyboarding, character creation, and final video generation β€” all end-to-end.

idea = """
If a cat and a dog are best friends, what would happen when they meet a new cat?
"""
user_requirement = """
For children, do not exceed 3 scenes.
"""
style = "Cartoon"

2. Novel2Video β€” Smart Literary Adaptation

Transform complete novels into episodic video content with intelligent narrative compression, character tracking, and scene-by-scene visual adaptation. The RAG-based long script design engine analyzes lengthy stories and automatically segments them into multi-scene scripts while retaining key plot developments.

3. Script2Video β€” Unlimited Screenplay Creation

Write any screenplay β€” from personal stories to epic adventures β€” and ViMax brings it to life with complete control over every aspect of visual storytelling.

script = """
EXT. SCHOOL GYM - DAY
A group of students are practicing basketball in the gym...
John: (dribbling the ball) I'm going to score a basket!
Jane: (smiling) Good job, John!
"""
user_requirement = "Fast-paced with no more than 20 shots."
style = "Animate Style"

4. AutoCameo β€” Interactive Video from Your Photo

Upload your photo and ViMax intelligently integrates you as a character with consistent appearance and natural interactions throughout the entire video. Create your own cameo across limitless creative scripts.

ViMax Features

12 Specialized Agents

The power of ViMax lies in its agent architecture. Each agent handles a specific production role:

Agent Role
Screenwriter Generates scripts from ideas and requirements
Script Planner Structures narrative flow and scene boundaries
Script Enhancer Refines scripts for visual storytelling
Novel Compressor Compresses long novels into segmented scripts
Character Extractor Extracts character descriptions and traits
Scene Extractor Identifies scene boundaries and environments
Event Extractor Maps key events and plot points
Character Portraits Generator Creates visual character references
Storyboard Artist Designs shot-level storyboards with cinematography language
Camera Image Generator Simulates multi-camera filming angles
Reference Image Selector Intelligently selects reference images for first-frame accuracy
Best Image Selector Parallel image generation with MLLM/VLM consistency check
Global Information Planner Manages cross-scene continuity and temporal coherence

Technical Capabilities

RAG-Based Long Script Generation

The script design engine intelligently analyzes lengthy, novel-like stories and automatically segments them into multi-scene script format, meticulously ensuring all key plot developments and character dialogues are accurately retained.

Expressive Storyboard Design

Shot-level storyboard system creates expressive storyboards through cinematography language based on user requirements and target audiences, establishing narrative rhythm for subsequent video generation.

Multi-Camera Filming Simulation

Simulates multi-camera filming to deliver an immersive viewing experience while maintaining consistent character positioning and backgrounds within the same scene.

Intelligent Reference Image Selection

Intelligently selects the reference image required for the first frame of the current video, including storyboards from the previous timeline, ensuring accuracy of multiple characters and environmental elements as the video becomes longer.

Automated Consistency Check

Generates multiple images in parallel and selects the best consistent image as the first frame through MLLM/VLM β€” imitating the workflow of human creators who review and select the best takes.

High-Efficiency Parallel Shot Generation

Parallel processing for sequential shots captured from the same camera enables highly efficient video production.

Three Pipeline Architectures

ViMax ships with three pre-built pipelines:

Pipeline Entry Point Input Use Case
Idea2Video main_idea2video.py Natural language idea Quick creative exploration
Script2Video main_script2video.py Screenplay/script Controlled narrative production
Novel2Movie novel2movie_pipeline.py Full novel text Long-form literary adaptation

Supported Backends

ViMax supports multiple generation backends:

  • Image Generation: Google Nanobanana API, Yunwu API, Doubao Seedream
  • Video Generation: Google Veo API, Yunwu API, Doubao Seedance
  • Chat Models: OpenAI-compatible (via OpenRouter), MiniMax (M2.7 with 1M context, M2.5 with 204K context)
  • Reranking: BGE via Silicon API

Quick Start

# Clone and install
git clone https://github.com/HKUDS/ViMax.git
cd ViMax
uv sync

# Configure API keys in configs/idea2video.yaml

Configure your model and API keys in configs/idea2video.yaml:

chat_model:
  init_args:
    model: google/gemini-2.5-flash-lite-preview-09-2025
    model_provider: openai
    api_key: <YOUR_API_KEY>
    base_url: https://openrouter.ai/api/v1

image_generator:
  class_path: tools.ImageGeneratorNanobananaGoogleAPI
  init_args:
    api_key: <YOUR_API_KEY>

video_generator:
  class_path: tools.VideoGeneratorVeoGoogleAPI
  init_args:
    api_key: <YOUR_API_KEY>

Then run:

python main_idea2video.py

Why ViMax Matters

ViMax represents a paradigm shift in AI video generation β€” moving from single-shot clip generators to full production pipelines. By decomposing video creation into specialized agent roles (just like a real film crew), ViMax achieves:

  1. Long-form consistency β€” Characters and environments remain stable across hundreds of shots
  2. Creative freedom β€” From any narrative format (idea, script, novel) to finished video
  3. Professional quality β€” Automated storyboarding, shot design, and consistency validation
  4. Production efficiency β€” Parallel processing and intelligent resource management

For developers and creators looking to build AI-powered video production workflows, ViMax provides the most complete open-source multi-agent framework available today.

Repository: github.com/HKUDS/ViMax
Stars: 5.7K+ | License: MIT | Language: Python 3.12

Watch PyShine on YouTube

Contents