Onyx: Open Source AI Platform with Advanced RAG and Agent Capabilities
Onyx is a groundbreaking open-source AI platform that has taken the developer community by storm, boasting an impressive 25,956 stars on GitHub with 3,460 forks. This platform represents a significant leap forward in enterprise AI applications, combining the power of Retrieval-Augmented Generation (RAG) with sophisticated agent capabilities. With 5,449 stars gained just this week, Onyx has become one of the most rapidly growing AI projects in the open-source ecosystem.
The platform addresses a critical need in the AI landscape: providing organizations with a self-hosted, privacy-focused alternative to commercial AI solutions while maintaining enterprise-grade features and scalability. Built with modern technologies including Python 3.11, FastAPI, Next.js 15, and PostgreSQL, Onyx delivers a production-ready solution for organizations seeking to leverage AI capabilities without compromising data sovereignty.
Architecture Overview
Understanding the Onyx Architecture
The Onyx architecture represents a sophisticated multi-tier system designed for enterprise scalability and flexibility. At its core, the platform follows a modern microservices-inspired design that separates concerns while maintaining cohesive data flow between components.
Frontend Layer: Next.js 15 with React 18
The frontend is built on Next.js 15, leveraging the latest features including server-side rendering (SSR), static site generation (SSG), and the new App Router architecture. React 18 provides the foundation for a responsive, component-based user interface that delivers a seamless experience across devices. TypeScript ensures type safety throughout the codebase, reducing runtime errors and improving developer productivity. Tailwind CSS enables rapid UI development with utility-first styling, resulting in a consistent and maintainable design system.
The frontend communicates with the backend through a well-defined REST API, with real-time capabilities powered by WebSocket connections for features like streaming responses and live collaboration. The architecture supports both traditional web access and API-first integrations, making it suitable for embedding into existing enterprise workflows.
Backend Layer: Python 3.11 with FastAPI
The backend leverages Python 3.11’s significant performance improvements, including faster CPython execution and enhanced error messaging. FastAPI provides a modern, high-performance web framework with automatic OpenAPI documentation, request validation through Pydantic models, and native async/await support for handling concurrent requests efficiently.
SQLAlchemy serves as the ORM layer, providing database abstraction while maintaining the flexibility to write complex queries when needed. Alembic handles database migrations, ensuring schema changes are version-controlled and can be applied consistently across environments. Celery manages background task processing, enabling long-running operations like document indexing and batch processing without blocking the main application thread.
Data Layer: PostgreSQL with Redis Caching
PostgreSQL serves as the primary data store, chosen for its robust feature set including full-text search capabilities, JSON support for flexible document storage, and excellent performance under load. The database schema is designed to support multi-tenancy, allowing organizations to isolate data between different departments or clients.
Redis provides a high-performance caching layer that significantly reduces response times for frequently accessed data. The caching strategy includes query result caching, session management, and rate limiting to protect against abuse. The architecture also supports Redis as a message broker for Celery, creating a unified infrastructure for both caching and task queuing.
Search Layer: Vespa Vector Database
Vespa powers the search capabilities, offering both traditional full-text search and vector-based semantic search. This dual approach enables users to find information through keyword matching while also discovering relevant content through semantic similarity. The vector database stores embeddings generated by multiple supported models, allowing for flexible model selection based on accuracy and performance requirements.
Key Architectural Benefits:
-
Scalability: Each layer can be scaled independently based on demand, with horizontal scaling supported for the application layer and vertical scaling for databases.
-
Resilience: The architecture includes built-in failover mechanisms, with Redis providing session persistence and PostgreSQL supporting streaming replication for high availability.
-
Security: OAuth2 and SAML authentication integrate with existing identity providers, while role-based access control (RBAC) ensures fine-grained permissions management.
-
Extensibility: The modular design allows organizations to extend functionality through custom agents, actions, and integrations without modifying core components.
RAG Pipeline
Understanding the Agentic RAG Pipeline
Onyx’s Agentic RAG (Retrieval-Augmented Generation) pipeline represents a significant evolution beyond traditional RAG implementations. While conventional RAG systems simply retrieve documents and pass them to an LLM, Onyx’s approach combines hybrid indexing with intelligent AI agents that actively participate in the retrieval and synthesis process.
Document Ingestion and Processing
The pipeline begins with document ingestion, supporting a wide variety of formats including PDFs, Word documents, HTML pages, plain text files, and structured data from databases. Each document undergoes a sophisticated processing pipeline:
-
Text Extraction: Specialized parsers extract text while preserving structure and metadata. For PDFs, this includes maintaining reading order and extracting tables as structured data.
-
Chunking Strategy: Documents are split into semantically meaningful chunks using intelligent algorithms that respect document structure. Rather than simple fixed-size chunking, Onyx employs recursive character text splitting with overlap, ensuring context is preserved across chunk boundaries.
-
Embedding Generation: Multiple embedding models are supported, including OpenAI’s text-embedding-3 series, Cohere’s embed models, and open-source alternatives like sentence-transformers. The system can use different embedding models for different document types based on optimization requirements.
-
Dual Indexing: Each chunk is indexed in both Vespa’s traditional inverted index for keyword search and its vector index for semantic search. This hybrid approach ensures users can find information through both exact matching and conceptual similarity.
Hybrid Search Implementation
The hybrid search mechanism combines multiple retrieval strategies to maximize recall while maintaining precision:
-
Keyword Search: BM25 ranking provides excellent results for exact term matching, particularly effective for technical terminology, product names, and specific identifiers.
-
Semantic Search: Vector similarity search enables finding conceptually related content even when keywords don’t match. This is particularly valuable for finding answers to questions phrased differently from the source material.
-
Fusion Ranking: Results from both search methods are combined using reciprocal rank fusion (RRF), producing a unified ranking that leverages the strengths of both approaches.
Agentic Enhancement
What sets Onyx apart is the integration of AI agents into the RAG process. Rather than simply retrieving and presenting documents, agents can:
-
Query Refinement: Analyze the user’s question and generate optimized search queries that capture the intent more effectively than the original question.
-
Multi-hop Retrieval: When initial results are insufficient, agents can formulate follow-up queries to gather additional context, building a comprehensive understanding of the topic.
-
Source Verification: Agents cross-reference information across multiple sources, identifying contradictions and highlighting the most reliable information.
-
Context Synthesis: Rather than simply concatenating retrieved chunks, agents synthesize information from multiple sources to provide coherent, comprehensive answers.
Re-ranking and Quality Assurance
After initial retrieval, a re-ranking model evaluates and reorders results based on relevance to the specific query. This two-stage retrieval process (initial retrieval followed by re-ranking) significantly improves answer quality while maintaining reasonable latency. The system also includes deduplication logic to avoid presenting redundant information from similar documents.
Performance Optimization
The pipeline incorporates several performance optimizations:
-
Caching: Frequently accessed embeddings and search results are cached in Redis, reducing database load and improving response times.
-
Parallel Processing: Document processing and embedding generation occur in parallel using Celery workers, maximizing throughput for large document collections.
-
Incremental Updates: Rather than rebuilding the entire index when documents change, Onyx supports incremental updates, adding or removing documents without downtime.
Agent System
Understanding the Agent System Architecture
Onyx’s agent system provides a powerful framework for creating AI agents with unique instructions, knowledge bases, and action capabilities. This system enables organizations to build specialized assistants tailored to specific workflows, departments, or use cases without requiring deep AI expertise.
Core Agent Components
1. Agent Definition and Configuration
Each agent is defined through a comprehensive configuration that includes:
-
System Prompt: The foundational instructions that define the agent’s personality, expertise, and behavioral constraints. This prompt establishes the agent’s role, communication style, and decision-making framework.
-
Knowledge Base Assignment: Agents can be connected to specific document collections, ensuring they have access to relevant information while maintaining data isolation between different organizational units.
-
Tool Integration: Agents can be equipped with various tools and actions, extending their capabilities beyond text generation to include external API calls, database queries, and workflow automation.
-
Model Selection: Different agents can use different LLM backends based on their requirements. A customer service agent might use a fast, cost-effective model, while a research agent might use a more capable model for complex reasoning tasks.
2. Custom Agent Creation
Organizations can create custom agents through a user-friendly interface or programmatically through the API. The creation process involves:
-
Instruction Design: Crafting clear, comprehensive instructions that guide the agent’s behavior. Onyx provides templates and best practices for effective instruction design.
-
Knowledge Curation: Selecting which documents and data sources the agent should have access to, enabling fine-grained control over information access.
-
Action Configuration: Defining what external actions the agent can perform, including API integrations, database operations, and workflow triggers.
-
Testing and Iteration: Built-in testing tools allow administrators to evaluate agent performance before deployment, with iteration capabilities to refine instructions and configurations.
3. Multi-Agent Orchestration
For complex tasks, Onyx supports multi-agent orchestration where multiple specialized agents collaborate:
-
Task Decomposition: A coordinator agent breaks down complex requests into subtasks suitable for specialized agents.
-
Parallel Execution: Independent subtasks are distributed to appropriate agents and executed concurrently, reducing overall response time.
-
Result Synthesis: The coordinator agent combines outputs from specialized agents into a coherent final response.
-
Conflict Resolution: When agents provide contradictory information, the system includes mechanisms to identify conflicts and seek clarification or additional sources.
4. Actions and MCP Integration
The Model Context Protocol (MCP) integration enables agents to interact with external applications:
-
Tool Definition: Each action is defined with clear input schemas, output schemas, and execution parameters, ensuring type safety and predictable behavior.
-
Authentication: Flexible authentication mechanisms support various credential types, including API keys, OAuth tokens, and certificate-based authentication.
-
Execution Environment: Actions run in isolated environments with appropriate permissions and resource limits, preventing unauthorized access and ensuring system stability.
-
Audit Logging: All action executions are logged for compliance and debugging purposes, providing a complete trail of agent activities.
5. Code Execution Sandbox
For data analysis and file manipulation tasks, Onyx includes a secure code execution environment:
-
Language Support: Python is the primary language for code execution, with pre-installed libraries for data analysis, visualization, and machine learning.
-
Resource Limits: CPU, memory, and execution time limits prevent runaway processes from affecting system stability.
-
File System Access: Controlled access to uploaded files and generated artifacts, with automatic cleanup after task completion.
-
Output Capture: Both standard output and error streams are captured and returned to the user, enabling interactive debugging and iteration.
6. Voice Mode Capabilities
The agent system includes comprehensive voice interaction support:
-
Speech-to-Text: Multiple STT backends are supported, including OpenAI’s Whisper, Google Speech-to-Text, and self-hosted alternatives for privacy-sensitive deployments.
-
Text-to-Speech: Natural-sounding voice output through integrations with various TTS providers, with options for different voices, languages, and speaking rates.
-
Real-time Processing: Streaming audio processing enables natural conversational flow without long pauses for processing.
7. Image Generation Integration
Agents can generate images through integrations with image generation models:
-
Prompt Engineering: Agents automatically refine user requests into optimized prompts for image generation models.
-
Model Selection: Support for multiple image generation backends, including DALL-E, Stable Diffusion, and other open-source alternatives.
-
Style Control: Fine-grained control over image style, composition, and quality parameters.
Deployment Modes
Understanding Deployment Options
Onyx offers flexible deployment options to meet diverse organizational needs, from small teams testing the platform to large enterprises requiring high-availability configurations. Each deployment mode is designed to balance ease of setup with production-grade capabilities.
1. Docker Compose Deployment
The simplest deployment option uses Docker Compose, ideal for development, testing, and small-scale production deployments:
-
Quick Start: A single
docker compose upcommand launches all required services, including the application server, PostgreSQL database, Redis cache, Vespa search engine, and Celery workers. -
Configuration: Environment variables control all aspects of the deployment, from database credentials to LLM API keys. A
.envfile provides a convenient way to manage configuration. -
Resource Management: Docker Compose allows specifying resource limits for each service, preventing any single component from consuming excessive resources.
-
Volume Management: Persistent volumes ensure data survives container restarts, with separate volumes for database storage, document uploads, and logs.
-
Networking: Internal networking between containers is automatically configured, while port mappings expose only necessary services to the host.
Best For: Development teams, proof-of-concept deployments, small organizations with limited infrastructure requirements.
2. Kubernetes Deployment
For organizations requiring enterprise-grade scalability and management, Onyx provides comprehensive Kubernetes support:
-
Helm Charts: Official Helm charts simplify deployment, with configurable values for replica counts, resource limits, and ingress settings.
-
Horizontal Pod Autoscaling: Kubernetes HPA automatically scales application pods based on CPU and memory utilization, handling traffic spikes without manual intervention.
-
Rolling Updates: Zero-downtime deployments are achieved through Kubernetes rolling update strategies, ensuring continuous availability during upgrades.
-
Service Mesh Integration: Optional integration with service meshes like Istio provides advanced traffic management, security policies, and observability.
-
Secret Management: Kubernetes secrets integrate with external secret management systems like HashiCorp Vault, ensuring sensitive credentials are properly secured.
-
Multi-zone Deployment: Kubernetes enables deployment across multiple availability zones for high availability, with automatic failover if a zone becomes unavailable.
Best For: Medium to large organizations, cloud-native environments, teams with existing Kubernetes infrastructure.
3. Cloud Provider Deployments
Onyx can be deployed on major cloud providers with provider-specific optimizations:
AWS Deployment:
- ECS/Fargate: Serverless container deployment eliminates the need to manage underlying infrastructure.
- RDS for PostgreSQL: Managed database service with automated backups, multi-AZ deployment, and automated failover.
- ElastiCache for Redis: Managed Redis with automatic failover and cluster mode for high availability.
- OpenSearch Service: Alternative to Vespa for organizations already invested in AWS search services.
Google Cloud Deployment:
- Cloud Run: Serverless container platform with automatic scaling based on request volume.
- Cloud SQL: Managed PostgreSQL with high availability configuration and automated backups.
- Memorystore: Managed Redis service with sub-millisecond latency.
Azure Deployment:
- Azure Container Apps: Serverless container service with KEDA-based autoscaling.
- Azure Database for PostgreSQL: Managed database with flexible server configuration.
- Azure Cache for Redis: Enterprise-grade Redis with clustering support.
Best For: Organizations with existing cloud infrastructure, teams preferring managed services over self-hosted infrastructure.
4. Self-Hosted Bare Metal Deployment
For organizations with strict data sovereignty requirements or existing infrastructure investments:
-
Manual Installation: Step-by-step guides for installing each component on bare metal servers or virtual machines.
-
High Availability Configuration: Detailed instructions for configuring PostgreSQL streaming replication, Redis Sentinel for failover, and load balancing for application servers.
-
Monitoring Integration: Support for Prometheus, Grafana, and other monitoring tools for comprehensive observability.
-
Backup and Recovery: Automated backup scripts with tested recovery procedures for disaster recovery scenarios.
Best For: Organizations with strict data sovereignty requirements, air-gapped environments, teams with dedicated infrastructure management capabilities.
Deployment Considerations
When selecting a deployment mode, consider:
-
Scale Requirements: Expected concurrent users, document volume, and query frequency influence infrastructure sizing.
-
Availability Requirements: Mission-critical deployments require high-availability configurations with automatic failover.
-
Security Requirements: Data sensitivity determines whether cloud deployment is acceptable or self-hosting is required.
-
Operational Capacity: Available DevOps expertise influences whether managed services or self-managed infrastructure is more appropriate.
-
Budget Constraints: Cloud managed services reduce operational overhead but increase direct costs; self-hosting requires more expertise but can be more cost-effective at scale.
Key Features
Agentic RAG
Onyx’s Agentic RAG combines the best of traditional search with AI-powered intelligence. Unlike conventional RAG systems that passively retrieve documents, Agentic RAG actively participates in the information retrieval process. The system understands query intent, performs multi-hop retrieval when necessary, and synthesizes information from multiple sources to provide comprehensive answers.
Key Capabilities:
- Hybrid search combining keyword and semantic retrieval
- Query refinement and expansion for improved recall
- Multi-hop retrieval for complex questions
- Source citation and verification
- Context-aware chunking and retrieval
Deep Research
The Deep Research feature represents Onyx’s most sophisticated capability, achieving top rankings on the Deep Research leaderboard in February 2026. This feature enables the system to conduct comprehensive research on complex topics through multi-step investigation.
Research Process:
- Query Analysis: Understanding the research question and identifying key concepts
- Source Discovery: Finding relevant sources across connected knowledge bases and web search
- Information Extraction: Extracting key facts, figures, and insights from sources
- Cross-Reference: Verifying information across multiple sources
- Synthesis: Combining findings into a comprehensive research report
- Citation: Providing proper attribution for all sourced information
Custom Agents
Organizations can create specialized AI agents tailored to specific use cases:
- Customer Service Agents: Trained on product documentation and support tickets
- Research Assistants: Connected to academic databases and research repositories
- HR Assistants: Knowledgeable about company policies and procedures
- Sales Enablement: Equipped with product information and competitive intelligence
Web Search Integration
Onyx integrates with multiple web search providers for real-time information retrieval:
| Provider | Features | Best For |
|---|---|---|
| Serper | Fast, cost-effective Google Search API | General web search |
| Google PSE | Custom search engines, refined results | Domain-specific search |
| Brave Search | Privacy-focused, independent index | Privacy-sensitive applications |
| SearXNG | Self-hosted, metasearch aggregation | Complete control over search |
| Firecrawl/Exa | AI-optimized search with content extraction | Research applications |
Artifacts Generation
The Artifacts feature enables agents to create downloadable content:
- Documents: Generate reports, summaries, and documentation in various formats
- Graphics: Create diagrams, charts, and visualizations
- Code: Produce executable code snippets with explanations
- Data Files: Export structured data in CSV, JSON, or other formats
Actions and MCP
The Model Context Protocol integration enables agents to interact with external systems:
- API Integration: Connect to any REST or GraphQL API
- Database Operations: Query and update databases with proper authentication
- Workflow Triggers: Initiate business processes in external systems
- File Operations: Read, write, and manage files in connected storage
Code Execution
The secure sandbox environment enables data analysis and computation:
# Example: Data analysis in the sandbox
import pandas as pd
import matplotlib.pyplot as plt
# Load uploaded data
df = pd.read_csv('/data/uploaded_file.csv')
# Perform analysis
summary = df.describe()
correlation = df.corr()
# Generate visualization
plt.figure(figsize=(10, 6))
df.plot(kind='bar')
plt.savefig('/output/analysis_result.png')
Voice Mode
Comprehensive voice interaction capabilities:
- Speech-to-Text: Multiple backend support including Whisper
- Text-to-Speech: Natural voice output with various voice options
- Real-time Processing: Streaming audio for natural conversation flow
- Language Support: Multiple languages for global deployments
Image Generation
AI-powered image creation:
- Multiple Backends: DALL-E, Stable Diffusion, and open-source alternatives
- Prompt Optimization: Automatic prompt refinement for better results
- Style Control: Fine-grained control over artistic style and composition
- Batch Generation: Generate multiple variations for selection
Installation
Prerequisites
- Docker and Docker Compose (for containerized deployment)
- Python 3.11+ (for development deployment)
- PostgreSQL 14+ (if not using Docker)
- Redis 6+ (if not using Docker)
Quick Start with Docker Compose
# Clone the repository
git clone https://github.com/onyx-dot-app/onyx.git
cd onyx
# Copy environment configuration
cp .env.example .env
# Edit .env with your configuration
# Required: LLM API keys (OpenAI, Anthropic, etc.)
# Required: Database credentials
# Optional: OAuth/SAML configuration
# Start all services
docker compose up -d
# Access the application
# Web UI: http://localhost:3000
# API: http://localhost:8080
Configuration
Key environment variables:
# Database
POSTGRES_USER=onyx
POSTGRES_PASSWORD=your_secure_password
POSTGRES_DB=onyx
# Redis
REDIS_URL=redis://redis:6379/0
# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
# Search
VESPA_HOST=vespa
VESPA_PORT=19071
# Authentication
OAUTH_CLIENT_ID=your_client_id
OAUTH_CLIENT_SECRET=your_client_secret
Kubernetes Deployment
# Add the Onyx Helm repository
helm repo add onyx https://helm.onyx.app
helm repo update
# Install with default configuration
helm install onyx onyx/onyx
# Or with custom values
helm install onyx onyx/onyx -f values.yaml
Usage Examples
Creating a Custom Agent
import requests
# Create a new agent
agent_config = {
"name": "Research Assistant",
"description": "Specialized in academic research",
"system_prompt": """You are a research assistant specialized in
academic literature. Help users find relevant papers, summarize
findings, and identify research gaps.""",
"knowledge_base_ids": ["academic_papers", "research_data"],
"tools": ["web_search", "document_analysis"],
"model": "gpt-4"
}
response = requests.post(
"http://localhost:8080/api/agents",
json=agent_config,
headers={"Authorization": "Bearer YOUR_API_KEY"}
)
agent_id = response.json()["id"]
Querying with RAG
# Submit a query
query = {
"query": "What are the latest advances in quantum computing?",
"agent_id": agent_id,
"options": {
"search_type": "hybrid",
"max_sources": 10,
"include_citations": True
}
}
response = requests.post(
"http://localhost:8080/api/query",
json=query,
headers={"Authorization": "Bearer YOUR_API_KEY"}
)
result = response.json()
print(result["answer"])
for citation in result["citations"]:
print(f"Source: {citation['source']}")
Deep Research
# Initiate deep research
research_request = {
"topic": "Impact of AI on healthcare diagnostics",
"depth": "comprehensive",
"sources": ["web", "knowledge_base"],
"output_format": "report"
}
response = requests.post(
"http://localhost:8080/api/research",
json=research_request,
headers={"Authorization": "Bearer YOUR_API_KEY"}
)
research_id = response.json()["research_id"]
# Poll for completion
import time
while True:
status = requests.get(
f"http://localhost:8080/api/research/{research_id}",
headers={"Authorization": "Bearer YOUR_API_KEY"}
).json()
if status["status"] == "completed":
print(status["report"])
break
time.sleep(10)
Technology Stack
Backend Technologies
| Component | Technology | Purpose |
|---|---|---|
| Language | Python 3.11 | Core backend development |
| Framework | FastAPI | REST API and async handling |
| ORM | SQLAlchemy | Database abstraction |
| Migrations | Alembic | Schema version control |
| Task Queue | Celery | Background job processing |
| Cache | Redis | Session and query caching |
| Database | PostgreSQL | Primary data storage |
| Search | Vespa | Vector and full-text search |
Frontend Technologies
| Component | Technology | Purpose |
|---|---|---|
| Framework | Next.js 15 | Server-side rendering |
| UI Library | React 18 | Component architecture |
| Language | TypeScript | Type safety |
| Styling | Tailwind CSS | Utility-first styling |
| State | Zustand | Client state management |
AI/ML Components
| Component | Technology | Purpose |
|---|---|---|
| LLM Framework | LangChain | Agent orchestration |
| LLM Gateway | LiteLLM | Multi-provider support |
| Embeddings | Multiple | Document vectorization |
| Vector DB | Vespa | Similarity search |
Conclusion
Onyx represents a significant advancement in open-source AI platforms, providing organizations with a powerful, flexible, and privacy-respecting alternative to commercial solutions. Its combination of Agentic RAG, Deep Research capabilities, and extensible agent system makes it suitable for a wide range of enterprise applications.
The platform’s modular architecture allows organizations to start small and scale as needed, with deployment options ranging from simple Docker Compose setups to enterprise Kubernetes clusters. The comprehensive feature set, including voice interaction, image generation, and code execution, positions Onyx as a complete solution for AI-powered knowledge management and automation.
With its rapidly growing community and active development, Onyx is poised to remain at the forefront of open-source AI platforms. Organizations looking to implement AI capabilities while maintaining control over their data should consider Onyx as a compelling option.
Related Posts
- AgentSkillOS: Skill Orchestration System
- MattPocock Skills: AI Agent Workflows
- AI Hedge Fund: Multi-Agent Investment System