Onyx: Open Source AI Platform with Advanced RAG and Agent Capabilities

Onyx is a groundbreaking open-source AI platform that has taken the developer community by storm, boasting an impressive 25,956 stars on GitHub with 3,460 forks. This platform represents a significant leap forward in enterprise AI applications, combining the power of Retrieval-Augmented Generation (RAG) with sophisticated agent capabilities. With 5,449 stars gained just this week, Onyx has become one of the most rapidly growing AI projects in the open-source ecosystem.

The platform addresses a critical need in the AI landscape: providing organizations with a self-hosted, privacy-focused alternative to commercial AI solutions while maintaining enterprise-grade features and scalability. Built with modern technologies including Python 3.11, FastAPI, Next.js 15, and PostgreSQL, Onyx delivers a production-ready solution for organizations seeking to leverage AI capabilities without compromising data sovereignty.

Architecture Overview

Onyx Architecture

Understanding the Onyx Architecture

The Onyx architecture represents a sophisticated multi-tier system designed for enterprise scalability and flexibility. At its core, the platform follows a modern microservices-inspired design that separates concerns while maintaining cohesive data flow between components.

Frontend Layer: Next.js 15 with React 18

The frontend is built on Next.js 15, leveraging the latest features including server-side rendering (SSR), static site generation (SSG), and the new App Router architecture. React 18 provides the foundation for a responsive, component-based user interface that delivers a seamless experience across devices. TypeScript ensures type safety throughout the codebase, reducing runtime errors and improving developer productivity. Tailwind CSS enables rapid UI development with utility-first styling, resulting in a consistent and maintainable design system.

The frontend communicates with the backend through a well-defined REST API, with real-time capabilities powered by WebSocket connections for features like streaming responses and live collaboration. The architecture supports both traditional web access and API-first integrations, making it suitable for embedding into existing enterprise workflows.

Backend Layer: Python 3.11 with FastAPI

The backend leverages Python 3.11’s significant performance improvements, including faster CPython execution and enhanced error messaging. FastAPI provides a modern, high-performance web framework with automatic OpenAPI documentation, request validation through Pydantic models, and native async/await support for handling concurrent requests efficiently.

SQLAlchemy serves as the ORM layer, providing database abstraction while maintaining the flexibility to write complex queries when needed. Alembic handles database migrations, ensuring schema changes are version-controlled and can be applied consistently across environments. Celery manages background task processing, enabling long-running operations like document indexing and batch processing without blocking the main application thread.

Data Layer: PostgreSQL with Redis Caching

PostgreSQL serves as the primary data store, chosen for its robust feature set including full-text search capabilities, JSON support for flexible document storage, and excellent performance under load. The database schema is designed to support multi-tenancy, allowing organizations to isolate data between different departments or clients.

Redis provides a high-performance caching layer that significantly reduces response times for frequently accessed data. The caching strategy includes query result caching, session management, and rate limiting to protect against abuse. The architecture also supports Redis as a message broker for Celery, creating a unified infrastructure for both caching and task queuing.

Search Layer: Vespa Vector Database

Vespa powers the search capabilities, offering both traditional full-text search and vector-based semantic search. This dual approach enables users to find information through keyword matching while also discovering relevant content through semantic similarity. The vector database stores embeddings generated by multiple supported models, allowing for flexible model selection based on accuracy and performance requirements.

Key Architectural Benefits:

Scalability: Each layer can be scaled independently based on demand, with horizontal scaling supported for the application layer and vertical scaling for databases.
Resilience: The architecture includes built-in failover mechanisms, with Redis providing session persistence and PostgreSQL supporting streaming replication for high availability.
Security: OAuth2 and SAML authentication integrate with existing identity providers, while role-based access control (RBAC) ensures fine-grained permissions management.
Extensibility: The modular design allows organizations to extend functionality through custom agents, actions, and integrations without modifying core components.

RAG Pipeline

Onyx RAG Pipeline

Understanding the Agentic RAG Pipeline

Onyx’s Agentic RAG (Retrieval-Augmented Generation) pipeline represents a significant evolution beyond traditional RAG implementations. While conventional RAG systems simply retrieve documents and pass them to an LLM, Onyx’s approach combines hybrid indexing with intelligent AI agents that actively participate in the retrieval and synthesis process.

Document Ingestion and Processing

The pipeline begins with document ingestion, supporting a wide variety of formats including PDFs, Word documents, HTML pages, plain text files, and structured data from databases. Each document undergoes a sophisticated processing pipeline:

Text Extraction: Specialized parsers extract text while preserving structure and metadata. For PDFs, this includes maintaining reading order and extracting tables as structured data.
Chunking Strategy: Documents are split into semantically meaningful chunks using intelligent algorithms that respect document structure. Rather than simple fixed-size chunking, Onyx employs recursive character text splitting with overlap, ensuring context is preserved across chunk boundaries.
Embedding Generation: Multiple embedding models are supported, including OpenAI’s text-embedding-3 series, Cohere’s embed models, and open-source alternatives like sentence-transformers. The system can use different embedding models for different document types based on optimization requirements.
Dual Indexing: Each chunk is indexed in both Vespa’s traditional inverted index for keyword search and its vector index for semantic search. This hybrid approach ensures users can find information through both exact matching and conceptual similarity.

Hybrid Search Implementation

The hybrid search mechanism combines multiple retrieval strategies to maximize recall while maintaining precision:

Keyword Search: BM25 ranking provides excellent results for exact term matching, particularly effective for technical terminology, product names, and specific identifiers.
Semantic Search: Vector similarity search enables finding conceptually related content even when keywords don’t match. This is particularly valuable for finding answers to questions phrased differently from the source material.
Fusion Ranking: Results from both search methods are combined using reciprocal rank fusion (RRF), producing a unified ranking that leverages the strengths of both approaches.

Agentic Enhancement

What sets Onyx apart is the integration of AI agents into the RAG process. Rather than simply retrieving and presenting documents, agents can:

Query Refinement: Analyze the user’s question and generate optimized search queries that capture the intent more effectively than the original question.
Multi-hop Retrieval: When initial results are insufficient, agents can formulate follow-up queries to gather additional context, building a comprehensive understanding of the topic.
Source Verification: Agents cross-reference information across multiple sources, identifying contradictions and highlighting the most reliable information.
Context Synthesis: Rather than simply concatenating retrieved chunks, agents synthesize information from multiple sources to provide coherent, comprehensive answers.

Re-ranking and Quality Assurance

After initial retrieval, a re-ranking model evaluates and reorders results based on relevance to the specific query. This two-stage retrieval process (initial retrieval followed by re-ranking) significantly improves answer quality while maintaining reasonable latency. The system also includes deduplication logic to avoid presenting redundant information from similar documents.

Performance Optimization

The pipeline incorporates several performance optimizations:

Caching: Frequently accessed embeddings and search results are cached in Redis, reducing database load and improving response times.
Parallel Processing: Document processing and embedding generation occur in parallel using Celery workers, maximizing throughput for large document collections.
Incremental Updates: Rather than rebuilding the entire index when documents change, Onyx supports incremental updates, adding or removing documents without downtime.

Agent System

Onyx Agent System

Understanding the Agent System Architecture

Onyx’s agent system provides a powerful framework for creating AI agents with unique instructions, knowledge bases, and action capabilities. This system enables organizations to build specialized assistants tailored to specific workflows, departments, or use cases without requiring deep AI expertise.

Core Agent Components

1. Agent Definition and Configuration

Each agent is defined through a comprehensive configuration that includes:

System Prompt: The foundational instructions that define the agent’s personality, expertise, and behavioral constraints. This prompt establishes the agent’s role, communication style, and decision-making framework.
Knowledge Base Assignment: Agents can be connected to specific document collections, ensuring they have access to relevant information while maintaining data isolation between different organizational units.
Tool Integration: Agents can be equipped with various tools and actions, extending their capabilities beyond text generation to include external API calls, database queries, and workflow automation.
Model Selection: Different agents can use different LLM backends based on their requirements. A customer service agent might use a fast, cost-effective model, while a research agent might use a more capable model for complex reasoning tasks.

2. Custom Agent Creation

Organizations can create custom agents through a user-friendly interface or programmatically through the API. The creation process involves:

Instruction Design: Crafting clear, comprehensive instructions that guide the agent’s behavior. Onyx provides templates and best practices for effective instruction design.
Knowledge Curation: Selecting which documents and data sources the agent should have access to, enabling fine-grained control over information access.
Action Configuration: Defining what external actions the agent can perform, including API integrations, database operations, and workflow triggers.
Testing and Iteration: Built-in testing tools allow administrators to evaluate agent performance before deployment, with iteration capabilities to refine instructions and configurations.

3. Multi-Agent Orchestration

For complex tasks, Onyx supports multi-agent orchestration where multiple specialized agents collaborate:

Task Decomposition: A coordinator agent breaks down complex requests into subtasks suitable for specialized agents.
Parallel Execution: Independent subtasks are distributed to appropriate agents and executed concurrently, reducing overall response time.
Result Synthesis: The coordinator agent combines outputs from specialized agents into a coherent final response.
Conflict Resolution: When agents provide contradictory information, the system includes mechanisms to identify conflicts and seek clarification or additional sources.

4. Actions and MCP Integration

The Model Context Protocol (MCP) integration enables agents to interact with external applications:

Tool Definition: Each action is defined with clear input schemas, output schemas, and execution parameters, ensuring type safety and predictable behavior.
Authentication: Flexible authentication mechanisms support various credential types, including API keys, OAuth tokens, and certificate-based authentication.
Execution Environment: Actions run in isolated environments with appropriate permissions and resource limits, preventing unauthorized access and ensuring system stability.
Audit Logging: All action executions are logged for compliance and debugging purposes, providing a complete trail of agent activities.

5. Code Execution Sandbox

For data analysis and file manipulation tasks, Onyx includes a secure code execution environment:

Language Support: Python is the primary language for code execution, with pre-installed libraries for data analysis, visualization, and machine learning.
Resource Limits: CPU, memory, and execution time limits prevent runaway processes from affecting system stability.
File System Access: Controlled access to uploaded files and generated artifacts, with automatic cleanup after task completion.
Output Capture: Both standard output and error streams are captured and returned to the user, enabling interactive debugging and iteration.

6. Voice Mode Capabilities

The agent system includes comprehensive voice interaction support:

Speech-to-Text: Multiple STT backends are supported, including OpenAI’s Whisper, Google Speech-to-Text, and self-hosted alternatives for privacy-sensitive deployments.
Text-to-Speech: Natural-sounding voice output through integrations with various TTS providers, with options for different voices, languages, and speaking rates.
Real-time Processing: Streaming audio processing enables natural conversational flow without long pauses for processing.

7. Image Generation Integration

Agents can generate images through integrations with image generation models:

Prompt Engineering: Agents automatically refine user requests into optimized prompts for image generation models.
Model Selection: Support for multiple image generation backends, including DALL-E, Stable Diffusion, and other open-source alternatives.
Style Control: Fine-grained control over image style, composition, and quality parameters.

Deployment Modes

Onyx Deployment Modes

Understanding Deployment Options

Onyx offers flexible deployment options to meet diverse organizational needs, from small teams testing the platform to large enterprises requiring high-availability configurations. Each deployment mode is designed to balance ease of setup with production-grade capabilities.

1. Docker Compose Deployment

The simplest deployment option uses Docker Compose, ideal for development, testing, and small-scale production deployments:

Quick Start: A single docker compose up command launches all required services, including the application server, PostgreSQL database, Redis cache, Vespa search engine, and Celery workers.
Configuration: Environment variables control all aspects of the deployment, from database credentials to LLM API keys. A .env file provides a convenient way to manage configuration.
Resource Management: Docker Compose allows specifying resource limits for each service, preventing any single component from consuming excessive resources.
Volume Management: Persistent volumes ensure data survives container restarts, with separate volumes for database storage, document uploads, and logs.
Networking: Internal networking between containers is automatically configured, while port mappings expose only necessary services to the host.

Best For: Development teams, proof-of-concept deployments, small organizations with limited infrastructure requirements.

2. Kubernetes Deployment

For organizations requiring enterprise-grade scalability and management, Onyx provides comprehensive Kubernetes support:

Helm Charts: Official Helm charts simplify deployment, with configurable values for replica counts, resource limits, and ingress settings.
Horizontal Pod Autoscaling: Kubernetes HPA automatically scales application pods based on CPU and memory utilization, handling traffic spikes without manual intervention.
Rolling Updates: Zero-downtime deployments are achieved through Kubernetes rolling update strategies, ensuring continuous availability during upgrades.
Service Mesh Integration: Optional integration with service meshes like Istio provides advanced traffic management, security policies, and observability.
Secret Management: Kubernetes secrets integrate with external secret management systems like HashiCorp Vault, ensuring sensitive credentials are properly secured.
Multi-zone Deployment: Kubernetes enables deployment across multiple availability zones for high availability, with automatic failover if a zone becomes unavailable.

Best For: Medium to large organizations, cloud-native environments, teams with existing Kubernetes infrastructure.

3. Cloud Provider Deployments

Onyx can be deployed on major cloud providers with provider-specific optimizations:

AWS Deployment:

ECS/Fargate: Serverless container deployment eliminates the need to manage underlying infrastructure.
RDS for PostgreSQL: Managed database service with automated backups, multi-AZ deployment, and automated failover.
ElastiCache for Redis: Managed Redis with automatic failover and cluster mode for high availability.
OpenSearch Service: Alternative to Vespa for organizations already invested in AWS search services.

Google Cloud Deployment:

Cloud Run: Serverless container platform with automatic scaling based on request volume.
Cloud SQL: Managed PostgreSQL with high availability configuration and automated backups.
Memorystore: Managed Redis service with sub-millisecond latency.

Azure Deployment:

Azure Container Apps: Serverless container service with KEDA-based autoscaling.
Azure Database for PostgreSQL: Managed database with flexible server configuration.
Azure Cache for Redis: Enterprise-grade Redis with clustering support.

Best For: Organizations with existing cloud infrastructure, teams preferring managed services over self-hosted infrastructure.

4. Self-Hosted Bare Metal Deployment

For organizations with strict data sovereignty requirements or existing infrastructure investments:

Manual Installation: Step-by-step guides for installing each component on bare metal servers or virtual machines.
High Availability Configuration: Detailed instructions for configuring PostgreSQL streaming replication, Redis Sentinel for failover, and load balancing for application servers.
Monitoring Integration: Support for Prometheus, Grafana, and other monitoring tools for comprehensive observability.
Backup and Recovery: Automated backup scripts with tested recovery procedures for disaster recovery scenarios.

Best For: Organizations with strict data sovereignty requirements, air-gapped environments, teams with dedicated infrastructure management capabilities.

Deployment Considerations

When selecting a deployment mode, consider:

Scale Requirements: Expected concurrent users, document volume, and query frequency influence infrastructure sizing.
Availability Requirements: Mission-critical deployments require high-availability configurations with automatic failover.
Security Requirements: Data sensitivity determines whether cloud deployment is acceptable or self-hosting is required.
Operational Capacity: Available DevOps expertise influences whether managed services or self-managed infrastructure is more appropriate.
Budget Constraints: Cloud managed services reduce operational overhead but increase direct costs; self-hosting requires more expertise but can be more cost-effective at scale.

Key Features

Agentic RAG

Onyx’s Agentic RAG combines the best of traditional search with AI-powered intelligence. Unlike conventional RAG systems that passively retrieve documents, Agentic RAG actively participates in the information retrieval process. The system understands query intent, performs multi-hop retrieval when necessary, and synthesizes information from multiple sources to provide comprehensive answers.

Key Capabilities:

Hybrid search combining keyword and semantic retrieval
Query refinement and expansion for improved recall
Multi-hop retrieval for complex questions
Source citation and verification
Context-aware chunking and retrieval

Deep Research

The Deep Research feature represents Onyx’s most sophisticated capability, achieving top rankings on the Deep Research leaderboard in February 2026. This feature enables the system to conduct comprehensive research on complex topics through multi-step investigation.

Research Process:

Query Analysis: Understanding the research question and identifying key concepts
Source Discovery: Finding relevant sources across connected knowledge bases and web search
Information Extraction: Extracting key facts, figures, and insights from sources
Cross-Reference: Verifying information across multiple sources
Synthesis: Combining findings into a comprehensive research report
Citation: Providing proper attribution for all sourced information

Custom Agents

Organizations can create specialized AI agents tailored to specific use cases:

Customer Service Agents: Trained on product documentation and support tickets
Research Assistants: Connected to academic databases and research repositories
HR Assistants: Knowledgeable about company policies and procedures
Sales Enablement: Equipped with product information and competitive intelligence

Web Search Integration

Onyx integrates with multiple web search providers for real-time information retrieval:

Provider	Features	Best For
Serper	Fast, cost-effective Google Search API	General web search
Google PSE	Custom search engines, refined results	Domain-specific search
Brave Search	Privacy-focused, independent index	Privacy-sensitive applications
SearXNG	Self-hosted, metasearch aggregation	Complete control over search
Firecrawl/Exa	AI-optimized search with content extraction	Research applications

Artifacts Generation

The Artifacts feature enables agents to create downloadable content:

Documents: Generate reports, summaries, and documentation in various formats
Graphics: Create diagrams, charts, and visualizations
Code: Produce executable code snippets with explanations
Data Files: Export structured data in CSV, JSON, or other formats

Actions and MCP

The Model Context Protocol integration enables agents to interact with external systems:

API Integration: Connect to any REST or GraphQL API
Database Operations: Query and update databases with proper authentication
Workflow Triggers: Initiate business processes in external systems
File Operations: Read, write, and manage files in connected storage

Code Execution

The secure sandbox environment enables data analysis and computation:

      
        # Example: Data analysis in the sandbox
import pandas as pd
import matplotlib.pyplot as plt

# Load uploaded data
df = pd.read_csv('/data/uploaded_file.csv')

# Perform analysis
summary = df.describe()
correlation = df.corr()

# Generate visualization
plt.figure(figsize=(10, 6))
df.plot(kind='bar')
plt.savefig('/output/analysis_result.png')

Voice Mode

Comprehensive voice interaction capabilities:

Speech-to-Text: Multiple backend support including Whisper
Text-to-Speech: Natural voice output with various voice options
Real-time Processing: Streaming audio for natural conversation flow
Language Support: Multiple languages for global deployments

Image Generation

AI-powered image creation:

Multiple Backends: DALL-E, Stable Diffusion, and open-source alternatives
Prompt Optimization: Automatic prompt refinement for better results
Style Control: Fine-grained control over artistic style and composition
Batch Generation: Generate multiple variations for selection

Installation

Prerequisites

Docker and Docker Compose (for containerized deployment)
Python 3.11+ (for development deployment)
PostgreSQL 14+ (if not using Docker)
Redis 6+ (if not using Docker)

Quick Start with Docker Compose

      
        # Clone the repository
git clone https://github.com/onyx-dot-app/onyx.git
cd onyx

# Copy environment configuration
cp .env.example .env

# Edit .env with your configuration
# Required: LLM API keys (OpenAI, Anthropic, etc.)
# Required: Database credentials
# Optional: OAuth/SAML configuration

# Start all services
docker compose up -d

# Access the application
# Web UI: http://localhost:3000
# API: http://localhost:8080

Configuration

Key environment variables:

      
        # Database
POSTGRES_USER=onyx
POSTGRES_PASSWORD=your_secure_password
POSTGRES_DB=onyx

# Redis
REDIS_URL=redis://redis:6379/0

# LLM Providers
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...

# Search
VESPA_HOST=vespa
VESPA_PORT=19071

# Authentication
OAUTH_CLIENT_ID=your_client_id
OAUTH_CLIENT_SECRET=your_client_secret

Kubernetes Deployment

      
        # Add the Onyx Helm repository
helm repo add onyx https://helm.onyx.app
helm repo update

# Install with default configuration
helm install onyx onyx/onyx

# Or with custom values
helm install onyx onyx/onyx -f values.yaml

Usage Examples

Creating a Custom Agent

      
    
      
        import requests

# Create a new agent
agent_config = {
    "name": "Research Assistant",
    "description": "Specialized in academic research",
    "system_prompt": """You are a research assistant specialized in 
    academic literature. Help users find relevant papers, summarize 
    findings, and identify research gaps.""",
    "knowledge_base_ids": ["academic_papers", "research_data"],
    "tools": ["web_search", "document_analysis"],
    "model": "gpt-4"
}

response = requests.post(
    "http://localhost:8080/api/agents",
    json=agent_config,
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

agent_id = response.json()["id"]

      
      
        

Querying with RAG

      
    
      
        # Submit a query
query = {
    "query": "What are the latest advances in quantum computing?",
    "agent_id": agent_id,
    "options": {
        "search_type": "hybrid",
        "max_sources": 10,
        "include_citations": True
    }
}

response = requests.post(
    "http://localhost:8080/api/query",
    json=query,
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

result = response.json()
print(result["answer"])
for citation in result["citations"]:
    print(f"Source: {citation['source']}")

      
      
        

Deep Research

      
    
      
        # Initiate deep research
research_request = {
    "topic": "Impact of AI on healthcare diagnostics",
    "depth": "comprehensive",
    "sources": ["web", "knowledge_base"],
    "output_format": "report"
}

response = requests.post(
    "http://localhost:8080/api/research",
    json=research_request,
    headers={"Authorization": "Bearer YOUR_API_KEY"}
)

research_id = response.json()["research_id"]

# Poll for completion
import time
while True:
    status = requests.get(
        f"http://localhost:8080/api/research/{research_id}",
        headers={"Authorization": "Bearer YOUR_API_KEY"}
    ).json()
    
    if status["status"] == "completed":
        print(status["report"])
        break
    time.sleep(10)

      
      
        

Technology Stack

Backend Technologies

Component	Technology	Purpose
Language	Python 3.11	Core backend development
Framework	FastAPI	REST API and async handling
ORM	SQLAlchemy	Database abstraction
Migrations	Alembic	Schema version control
Task Queue	Celery	Background job processing
Cache	Redis	Session and query caching
Database	PostgreSQL	Primary data storage
Search	Vespa	Vector and full-text search

Frontend Technologies

Component	Technology	Purpose
Framework	Next.js 15	Server-side rendering
UI Library	React 18	Component architecture
Language	TypeScript	Type safety
Styling	Tailwind CSS	Utility-first styling
State	Zustand	Client state management

AI/ML Components

Component	Technology	Purpose
LLM Framework	LangChain	Agent orchestration
LLM Gateway	LiteLLM	Multi-provider support
Embeddings	Multiple	Document vectorization
Vector DB	Vespa	Similarity search

Conclusion

Onyx represents a significant advancement in open-source AI platforms, providing organizations with a powerful, flexible, and privacy-respecting alternative to commercial solutions. Its combination of Agentic RAG, Deep Research capabilities, and extensible agent system makes it suitable for a wide range of enterprise applications.

The platform’s modular architecture allows organizations to start small and scale as needed, with deployment options ranging from simple Docker Compose setups to enterprise Kubernetes clusters. The comprehensive feature set, including voice interaction, image generation, and code execution, positions Onyx as a complete solution for AI-powered knowledge management and automation.

With its rapidly growing community and active development, Onyx is poised to remain at the forefront of open-source AI platforms. Organizations looking to implement AI capabilities while maintaining control over their data should consider Onyx as a compelling option.

Onyx: Open Source AI Platform with Advanced RAG and Agent Capabilities

Onyx: Open Source AI Platform with Advanced RAG and Agent Capabilities

Architecture Overview

Understanding the Onyx Architecture

RAG Pipeline

Understanding the Agentic RAG Pipeline

Agent System

Understanding the Agent System Architecture

Deployment Modes

Understanding Deployment Options

Key Features

Agentic RAG

Deep Research

Custom Agents

Web Search Integration

Artifacts Generation

Actions and MCP

Code Execution

Voice Mode

Image Generation

Installation

Prerequisites

Quick Start with Docker Compose

Configuration

Kubernetes Deployment

Usage Examples

Creating a Custom Agent

Querying with RAG

Deep Research

Technology Stack

Backend Technologies

Frontend Technologies

AI/ML Components

Conclusion

Related Posts

Related Posts

PyQt5 Terminal Console - Build a Command Line Interface i...

Making a Python GUI for Sine and Cosine Plots with Pyqtgraph

75+ Good Python Coding Examples for Software Development ...

Test your audible frequency range in Python

AgentSkillOS: An Operating System for Agent Skills

UDP Single server to multiple clients

Contents