DeerFlow: ByteDance’s Open-Source SuperAgent Harness

DeerFlow (Deep Exploration and Efficient Research Flow) is an open-source Super Agent Harness developed by ByteDance that has taken the AI community by storm. With over 59,000 stars on GitHub and reaching the #1 spot on GitHub Trending on February 28, 2026, DeerFlow represents a paradigm shift in how we think about AI agent orchestration.

What sets DeerFlow apart from other agent frameworks is its comprehensive approach to task execution. It does not just provide an AI assistant that can chat - it provides a complete harness that orchestrates sub-agents, manages persistent memory, executes code in isolated sandboxes, and accomplishes complex multi-step tasks through extensible skills. The project is a ground-up rewrite (version 2.0) that shares no code with its predecessor, designed from scratch to handle real-world workloads.

System Architecture

DeerFlow Architecture

Understanding the DeerFlow Architecture

The DeerFlow architecture represents a sophisticated multi-layered system designed for enterprise-grade AI agent orchestration. At its core, the architecture follows a microservices pattern that separates concerns across distinct components while maintaining seamless communication channels.

Client Layer: The client layer serves as the primary interface for users interacting with DeerFlow. This includes web-based interfaces, mobile applications, and integrated development environments. The client communicates with the backend through a unified API gateway, ensuring consistent request handling regardless of the source. The architecture supports multiple client types simultaneously, allowing developers to interact via CLI tools like Claude Code while also supporting web dashboard access for monitoring and management.

Nginx Reverse Proxy: Acting as the traffic controller, Nginx handles load balancing, SSL termination, and request routing. This layer provides critical security benefits by hiding the internal service topology from external clients. Nginx also manages static content delivery for the frontend application, reducing load on the application servers. The proxy configuration supports WebSocket connections for real-time streaming responses, essential for the interactive chat experience that DeerFlow provides.

Frontend Application: Built with modern web technologies, the frontend delivers a responsive user interface for interacting with agents. It handles thread management, file uploads, skill selection, and real-time response streaming. The frontend communicates with both the Gateway API and the LangGraph server, depending on the operation type. User sessions are managed client-side with secure token handling, and the interface adapts to different screen sizes for desktop and mobile usage.

Gateway API: The Gateway API serves as the central orchestration point for all backend operations. It handles authentication, request validation, and routing to appropriate services. The Gateway manages thread lifecycle, file upload processing, memory operations, and skill management. In Gateway mode (an experimental deployment option), this component also embeds the agent runtime directly, eliminating the need for a separate LangGraph server process. This reduces resource overhead and simplifies deployment for smaller installations.

LangGraph Server: At the heart of DeerFlow lies the LangGraph server, which manages agent execution using LangChain’s LangGraph framework. This server handles the complex state machine logic required for multi-turn conversations, tool execution, and sub-agent delegation. The LangGraph server maintains conversation history, manages checkpoint states for resumable executions, and coordinates between different agent components. It supports both synchronous and asynchronous execution modes, allowing long-running tasks to proceed without blocking the main thread.

Database Layer: The architecture includes persistent storage for conversation threads, memory data, and user preferences. The database layer supports multiple backends including SQLite for local development and PostgreSQL for production deployments. This layer ensures that conversation context is preserved across sessions and that the long-term memory feature functions correctly.

Communication Flow: When a user submits a request, it flows through Nginx to the Gateway API. The Gateway validates the request, checks authentication, and forwards it to the LangGraph server. The LangGraph server then orchestrates the agent execution, potentially spawning sub-agents, invoking skills, and executing tools. Results stream back through the same path, with the Gateway handling any necessary transformations before sending responses to the client.

Scalability Considerations: The architecture is designed for horizontal scaling. Each component can be scaled independently based on load requirements. The LangGraph server can run multiple workers to handle concurrent agent executions, while the Gateway API can be load-balanced across multiple instances. This separation allows organizations to allocate resources where they are needed most, whether that is handling more concurrent users or running more complex agent workflows.

Middleware Chain

DeerFlow Middleware

Understanding the Middleware Architecture

DeerFlow’s middleware system represents one of its most sophisticated architectural decisions, providing a composable pipeline for processing requests and responses. The middleware chain consists of 12 distinct components, each responsible for a specific aspect of request handling, security, or functionality enhancement.

Thread Data Middleware: This foundational component manages the lifecycle of conversation threads. It ensures that each request is associated with the correct thread context, loading previous messages and state when needed. The middleware handles thread creation, retrieval, and cleanup, maintaining the continuity that makes multi-turn conversations possible. It also manages thread-level configuration, allowing different threads to have different model settings or skill configurations.

Uploads Middleware: File handling is crucial for an agent that needs to process documents, images, and other user-provided content. The uploads middleware validates file types, manages storage locations, and ensures that uploaded files are accessible to the agent when needed. It implements security checks to prevent malicious file uploads and manages cleanup of temporary files after processing is complete.

Title Middleware: Automatic title generation helps users organize their conversations. This middleware analyzes the initial messages in a thread and generates a descriptive title, making it easier to find and resume previous conversations. The title generation uses the configured LLM to create meaningful summaries rather than generic timestamps.

Todo Middleware: Task management is essential for complex workflows. The todo middleware tracks pending tasks, manages task state transitions, and ensures that multi-step operations complete successfully. It provides visibility into what the agent is working on and helps users understand the progress of long-running operations.

Clarification Middleware: When user requests are ambiguous or incomplete, this middleware triggers clarification questions. It analyzes the request context and determines what additional information is needed before proceeding. This prevents the agent from making incorrect assumptions and improves the quality of responses.

Guardrail Middleware: Safety and compliance are enforced through the guardrail middleware. It checks requests against defined policies, filters sensitive content, and ensures that agent behavior stays within acceptable bounds. This is particularly important for enterprise deployments where compliance requirements must be met.

Loop Detection Middleware: Preventing infinite loops is critical for autonomous agents. This middleware monitors execution patterns and detects when the agent is stuck in a repetitive cycle. When detected, it can break the loop and provide alternative approaches or request user intervention.

Subagent Limit Middleware: Sub-agent delegation is powerful but can lead to resource exhaustion if unchecked. This middleware enforces limits on the number of concurrent sub-agents (defaulting to 3), preventing runaway spawning that could overwhelm system resources. It also manages sub-agent lifecycle and cleanup.

Sandbox Audit Middleware: Security auditing for sandbox operations is handled by this component. It logs all shell commands executed within sandboxes, tracks file system operations, and provides an audit trail for compliance and debugging purposes. This is essential for understanding what actions the agent took during execution.

LLM Error Handling Middleware: LLM API calls can fail for various reasons - rate limits, network issues, or model errors. This middleware implements retry logic with exponential backoff, handles graceful degradation when models are unavailable, and provides meaningful error messages to users when operations cannot be completed.

Tool Error Handling Middleware: Similar to LLM error handling, this component manages failures in tool execution. It determines which errors are recoverable and which should terminate the workflow, implements fallback strategies when available, and ensures that partial results are preserved when possible.

Dangling Tool Call Middleware: Sometimes tool executions are interrupted or orphaned. This middleware cleans up dangling tool calls, ensures proper state management for interrupted operations, and handles edge cases where responses arrive after timeouts have expired.

Pipeline Orchestration: The middleware chain is orchestrated as a pipeline where each component can modify the request, response, or both. Components can short-circuit the pipeline by returning early (useful for authentication failures or validation errors) or can add post-processing logic that runs after the main handler. This design provides maximum flexibility while maintaining clear separation of concerns.

Configuration and Extensibility: Each middleware component can be configured independently through the config.yaml file. Organizations can enable or disable specific middlewares, adjust their parameters, or add custom middleware implementations. This extensibility allows DeerFlow to adapt to different use cases and compliance requirements without modifying core code.

Skills System

DeerFlow Skills

Understanding the Progressive Skill Loading System

DeerFlow’s skills system is one of its most innovative features, implementing a three-tier progressive loading architecture that optimizes context window usage while maintaining extensibility. This design addresses a fundamental challenge in AI agent systems: how to provide agents with rich capabilities without overwhelming the context window with instructions.

Tier 1: Core Skills (Always Loaded): The first tier contains essential skills that are always available to the agent. These include fundamental capabilities like research, report generation, and basic file operations. Core skills are loaded at agent initialization and remain in context throughout the session. This ensures that the agent always has access to its most commonly needed capabilities without requiring explicit loading requests. The core skills are carefully curated to include only those capabilities that are frequently used across different task types.

Tier 2: Standard Skills (Loaded on Demand): The second tier contains specialized skills that are loaded when the task requires them. This includes capabilities like data analysis, image generation, podcast creation, and presentation building. When the agent determines that a task requires one of these skills, it requests the skill to be loaded into context. This on-demand loading keeps the context window lean while still providing access to a wide range of capabilities. The skill loading mechanism includes validation to ensure that skills are properly formatted and safe to execute.

Tier 3: Custom Skills (User-Defined): The third tier allows users to define their own skills through Markdown files. Custom skills can extend DeerFlow’s capabilities in domain-specific ways, whether that is specialized research workflows, organization-specific templates, or integration with proprietary systems. Custom skills follow the same structure as built-in skills, ensuring consistency in how they are discovered, loaded, and executed. Users can share custom skills through the skill archive format (.skill files), enabling community-driven capability expansion.

Skill Discovery and Selection: When a task arrives, the agent analyzes the requirements and determines which skills are relevant. This discovery process uses semantic matching to identify skills that align with the task, rather than requiring exact keyword matches. The agent can query the skill registry to find available capabilities and select the most appropriate ones for the current context. This intelligent selection ensures that the right tools are available without manual configuration.

Skill Structure and Components: Each skill is defined by a SKILL.md file that contains structured content including the skill name, description, instructions, and references to supporting resources. Skills can include templates for generating specific output formats, scripts for executing complex operations, and references to external documentation. This structure makes skills self-documenting and easy to understand, both for the agent and for human developers creating new skills.

Context Management: The progressive loading system is tightly integrated with DeerFlow’s context management strategy. As skills are loaded, their instructions are added to the context window. When skills are no longer needed, their instructions can be summarized and offloaded to keep the context focused on the current task. This dynamic context management allows DeerFlow to work with models that have smaller context windows while still providing access to a rich set of capabilities.

Skill Installation and Management: DeerFlow supports installing skills from external sources through the Gateway API. Users can install .skill archives that package skill definitions, templates, and scripts into a single distributable unit. The skill management system handles version checking, dependency resolution, and conflict detection. Skills can be enabled or disabled dynamically, allowing users to customize their DeerFlow instance for specific workflows.

Built-in Skills Overview: DeerFlow ships with over 20 built-in skills covering research, content creation, data analysis, and automation workflows. These include academic paper review, bootstrap (conversation guide), code documentation, consulting analysis, data analysis, deep research, skill discovery, frontend design, GitHub deep research, image generation, newsletter generation, podcast generation, PPT generation, skill creation, and more. Each skill is designed to work independently or in combination with others, enabling complex multi-step workflows.

Sandbox Architecture

DeerFlow Sandbox

Understanding the Sandbox Execution Environment

DeerFlow’s sandbox architecture is what truly distinguishes it from other agent frameworks. While many AI agents can only generate text or make API calls, DeerFlow provides a complete execution environment where agents can read files, write code, execute shell commands, and produce tangible outputs. This capability transforms DeerFlow from a conversational assistant into an autonomous worker.

Local Sandbox Provider: The simplest execution mode is the local sandbox provider. In this mode, operations execute directly on the host machine with file system access mapped to per-thread directories. This provides fast execution and easy debugging but offers no isolation - any code executed has the same permissions as the user running DeerFlow. For this reason, local sandbox mode is recommended only for trusted local development environments. File operations are still scoped to thread-specific directories, preventing accidental cross-thread interference, but shell execution is disabled by default for security.

Docker Sandbox Provider: The recommended mode for most deployments is the Docker sandbox provider. In this mode, each task executes within an isolated Docker container. The container has its own filesystem, network namespace, and process space, providing strong isolation from the host system. Skills, workspace files, and outputs are mounted into the container as volumes, allowing the agent to access necessary resources while preventing access to the host system. This isolation is critical when executing code from untrusted sources or when running in multi-user environments.

Kubernetes Sandbox Provider: For enterprise deployments requiring even stronger isolation and scalability, DeerFlow supports Kubernetes-based sandbox execution. In this mode, each task runs in its own Kubernetes pod, providing container-level isolation with the added benefits of Kubernetes orchestration. This includes automatic resource limits, scheduling across nodes, and integration with enterprise security policies. The provisioner service manages pod lifecycle, ensuring that resources are cleaned up after task completion.

File System Layout: Regardless of the sandbox type, DeerFlow maintains a consistent file system layout that agents can rely on. The /mnt/user-data/ directory contains three key subdirectories: uploads/ for user-provided files, workspace/ for the agent’s working directory, and outputs/ for final deliverables. Skills are mounted at /mnt/skills/, with separate directories for public skills (built-in) and custom skills (user-defined). This consistent structure allows skills to be portable across different sandbox implementations.

Security Considerations: The sandbox architecture implements multiple layers of security. Network access can be restricted or disabled entirely for sensitive operations. Resource limits prevent runaway processes from consuming all available CPU or memory. File system access is scoped to specific directories, preventing agents from accessing sensitive system files. Shell command execution requires explicit enablement and can be further restricted to specific commands. These controls make DeerFlow suitable for production deployments where security is a primary concern.

Execution Flow: When an agent needs to execute code or access files, the request flows through the sandbox abstraction layer. This layer determines which sandbox provider to use based on configuration, prepares the execution environment, and manages the lifecycle of the execution context. For Docker and Kubernetes providers, this includes container/pod creation, volume mounting, and cleanup after execution. The abstraction layer ensures that skills and tools work consistently regardless of the underlying sandbox implementation.

Performance Trade-offs: Each sandbox type offers different performance characteristics. Local execution is fastest but offers no isolation. Docker containers provide good isolation with moderate overhead - container startup takes a few seconds, but execution within the container is nearly as fast as local. Kubernetes pods have higher startup overhead but offer the strongest isolation and scalability benefits. Organizations should choose based on their specific requirements for security, performance, and scalability.

Audit and Logging: All sandbox operations are logged for audit and debugging purposes. The sandbox audit middleware records every shell command executed, every file accessed, and every operation performed within the sandbox. This audit trail is essential for understanding what actions the agent took, diagnosing failures, and meeting compliance requirements. Logs can be exported to external monitoring systems for long-term retention and analysis.

Key Features

DeerFlow packs an impressive array of features that make it suitable for both individual developers and enterprise teams:

Sub-Agent Delegation: Complex tasks are decomposed and distributed to specialized sub-agents. The lead agent can spawn up to 3 concurrent sub-agents, each with its own scoped context, tools, and termination conditions. Sub-agents run in parallel when possible and report back structured results for synthesis.

Long-Term Memory: Unlike stateless chatbots, DeerFlow builds persistent memory across sessions. It remembers user preferences, technical context, and accumulated knowledge. The more you use it, the better it understands your workflows and requirements.

Multi-Channel IM Integration: DeerFlow can receive tasks from messaging platforms including Telegram, Slack, Feishu, and WeCom. Each channel supports commands like /new, /status, /models, /memory, and /help for managing conversations.

MCP Server Support: The Model Context Protocol (MCP) allows DeerFlow to integrate with external tools and services. MCP servers can be configured to extend DeerFlow’s capabilities with custom functionality.

Model Agnostic: DeerFlow works with any LLM that implements the OpenAI-compatible API. This includes OpenAI GPT models, Anthropic Claude, Google Gemini, DeepSeek, Kimi, and local models through vLLM. The architecture abstracts provider differences, allowing seamless switching between models.

12 Middleware Components: The comprehensive middleware pipeline handles everything from thread management to error handling, ensuring robust and reliable operation.

20+ Built-in Skills: From research to report generation, from image creation to podcast production, DeerFlow includes skills for a wide range of tasks out of the box.

Technology Stack

DeerFlow is built on modern, production-ready technologies:

Component	Technology
Backend Framework	Python 3.12+, LangChain, LangGraph
Frontend	Node.js 22+, Modern Web Framework
Container Runtime	Docker, Kubernetes (optional)
Database	SQLite (dev), PostgreSQL (prod)
Reverse Proxy	Nginx
LLM Integration	OpenAI-compatible API
Tracing	LangSmith, Langfuse

Getting Started

Prerequisites

Python 3.12 or higher
Node.js 22 or higher
Docker (for sandbox execution)
Git

Installation

Clone the repository:

      
        git clone https://github.com/bytedance/deer-flow.git
cd deer-flow

Generate configuration files:
```
make config
```
Configure your model (edit config.yaml): ```yaml models:
- name: gpt-4 display_name: GPT-4 use: langchain_openai:ChatOpenAI model: gpt-4 api_key: $OPENAI_API_KEY max_tokens: 4096 temperature: 0.7 ```

Set API keys (create .env file):

      
        OPENAI_API_KEY=your-api-key
TAVILY_API_KEY=your-tavily-key

Start with Docker (recommended):

      
        make docker-init    # First time setup
make docker-start   # Start services

Access the interface: Open http://localhost:2026

Local Development

For development without Docker:

      
        make check    # Verify prerequisites
make install  # Install dependencies
make dev      # Start development server

Available Skills

DeerFlow includes over 20 built-in skills covering diverse use cases:

Skill	Description
academic-paper-review	Review and analyze academic papers
bootstrap	Conversation guide and onboarding
claude-to-deerflow	Integration with Claude Code CLI
code-documentation	Generate code documentation
consulting-analysis	Business consulting analysis
data-analysis	Data analysis and visualization
deep-research	Comprehensive research workflows
find-skills	Discover and install new skills
frontend-design	Frontend design and development
github-deep-research	Deep research on GitHub repositories
image-generation	AI-powered image generation
newsletter-generation	Create newsletters from content
podcast-generation	Generate podcast content
ppt-generation	Create PowerPoint presentations
skill-creator	Create and package new skills
surprise-me	Random skill discovery
vercel-deploy-claimable	Deploy to Vercel
video-generation	Generate video content

Security Considerations

DeerFlow has powerful capabilities including system command execution and file operations. The project is designed for deployment in trusted local environments (127.0.0.1). For cross-network deployment, implement:

IP allowlist via iptables or firewall ACLs
Authentication gateway with strong pre-authentication
Network isolation in dedicated VLANs
Regular security updates

Conclusion

DeerFlow represents a significant advancement in AI agent frameworks. By combining sub-agent orchestration, persistent memory, sandbox execution, and extensible skills, it provides a complete platform for building autonomous AI workers. The architecture is designed for real-world use, with enterprise features like middleware-based request handling, multiple sandbox options, and comprehensive security controls.

Whether you are a researcher looking to automate literature reviews, a developer building data pipelines, or a content creator automating production workflows, DeerFlow provides the infrastructure to get the job done. The active community and growing ecosystem of skills make it a compelling choice for anyone serious about AI agent development.

The project’s rapid growth to 59,000+ stars and #1 GitHub Trending status speaks to the demand for this type of comprehensive agent framework. As AI continues to evolve, platforms like DeerFlow will be essential for turning AI capabilities into practical, production-ready applications.

DeerFlow: ByteDance's Open-Source SuperAgent Harness