Future AGI: The Open-Source Platform for Self-Improving AI Agents

Future AGI is an open-source platform that collapses the entire AI agent lifecycle into one self-improving feedback loop. Instead of stitching together separate tools for evaluations, observability, and guardrails, Future AGI provides simulate, evaluate, protect, monitor, optimize, and gateway capabilities in a single platform – so agents do not just get monitored, they self-improve.

Architecture Overview

Understanding the Six-Pillar Architecture

The architecture diagram above illustrates Future AGI’s six core pillars and how they connect through a self-improving feedback loop. Let’s break down each component:

Simulate – Personas and Edge Cases The simulation engine generates thousands of multi-turn conversations against realistic personas, adversarial inputs, and edge cases. It supports both text and voice agents (LiveKit, VAPI, Retell, Pipecat), enabling teams to test their agents under conditions that mirror real-world usage before deployment.

Evaluate – 50+ Metrics Under One Call A single evaluate() call runs 50+ metrics: groundedness, hallucination detection, tool-use correctness, PII leakage, tone analysis, and custom rubrics. The evaluation engine combines three approaches – LLM-as-judge, heuristic checks, and ML classifiers – to provide comprehensive coverage without requiring multiple tools.

Protect – 18 Built-in Scanners + 15 Vendor Adapters The protection layer includes 18 built-in scanners (PII, jailbreak, injection, and more) plus 15 vendor adapters (Lakera, Presidio, Llama Guard, and others). These can run inline in the gateway or as a standalone SDK, providing real-time guardrails for production agents.

Monitor – OpenTelemetry-Native Tracing Monitoring is built on OpenTelemetry with 50+ framework instrumentors (LangChain, LlamaIndex, CrewAI, DSPy, and more). Span graphs, latency tracking, token cost analysis, and live dashboards come with zero configuration.

Command Center – 100+ Provider Gateway The Go-based gateway handles ~29k requests per second on t3.xlarge with P99 latency under 21ms even with guardrails enabled. It supports 100+ LLM providers, 15 routing strategies, semantic caching, virtual keys, MCP, and A2A protocols.

Optimize – 6 Prompt Optimization Algorithms Six algorithms (GEPA, PromptWizard, ProTeGi, Bayesian, Meta-Prompt, and Random) automatically improve prompts using production traces as training data, closing the feedback loop.

Platform Stack

Understanding the Platform Stack

The platform stack diagram shows the four-layer architecture of Future AGI, where every arrow represents an open, documented interface:

Client Layer – traceAI SDKs The traceAI instrumentation libraries provide zero-config OpenTelemetry tracing for Python, TypeScript, Java, and C#. With just two lines of code, your existing OpenAI (or other framework) calls are automatically traced, with spans flowing into the platform via OTLP.

Edge Layer – Agent Command Center The Go-based gateway sits at the edge, handling all incoming requests with OpenAI-compatible HTTP. It provides weighted routing (~9.9ns per decision), semantic caching, virtual keys, and inline guardrail enforcement. The 29k req/s throughput with P99 under 21ms makes it production-ready for high-traffic deployments.

Platform Layer – Django 4.2 + Channels The core platform runs on Django 4.2 with Channels for real-time WebSocket support, plus Temporal for durable workflow orchestration. This layer houses all six pillars: simulate, evaluate, protect, monitor, optimize, and the command center management.

Data Layer – PostgreSQL + ClickHouse + Redis + RabbitMQ PostgreSQL stores metadata and configuration, ClickHouse handles spans and time-series data at scale, Redis manages state and caching, and RabbitMQ plus Temporal handle job queues and durable workflows. Every interface uses standard SQL or OTLP, so you can drop in your own infrastructure at any layer.

Evaluation Pipeline

Understanding the Evaluation Pipeline

The evaluation pipeline diagram shows how agent traces and outputs flow through three parallel evaluator types before being aggregated into a single score set:

LLM-as-Judge The LLM judge uses a separate language model to evaluate outputs on dimensions that require semantic understanding: groundedness (is the response based on provided context?), hallucination detection (does the response invent facts?), tool-use correctness (did the agent call the right tools with the right parameters?), and overall response quality. This approach catches subtle errors that rule-based systems miss.

Heuristic Evaluators Heuristic checks provide deterministic, fast evaluation for well-defined patterns: PII detection via regex, length constraints, format validation (JSON schema, URL patterns), and custom rule-based checks. These run in milliseconds and provide consistent, reproducible scores.

ML Classifiers Custom ML models handle classification tasks that sit between heuristic rules and LLM judgment: embedding similarity for semantic deduplication, toxicity detection, topic classification, and custom classifiers trained on your specific domain data. These models run locally for low-latency production use.

Aggregation via evaluate() All three evaluator types feed into a single evaluate() call that returns 50+ metrics in one response. This eliminates the need to manage separate evaluation tools and ensures all scores are computed consistently against the same trace data.

Self-Improving Feedback Loop

Feedback Loop

Understanding the Self-Improving Feedback Loop

The feedback loop diagram illustrates the core value proposition of Future AGI: agents that do not just get monitored, they self-improve. Here is how each stage connects:

1. Deploy Agent to Production You deploy your AI agent (customer support, voice assistant, RAG pipeline, or coding agent) with traceAI instrumentation. From the first request, all interactions are captured as OpenTelemetry spans.

2. Monitor – OTel Traces, Cost, and Latency Every request flows through the monitoring layer, producing span graphs, latency histograms, token cost breakdowns, and live dashboards. You see exactly what your agent is doing, how long it takes, and how much it costs – with zero configuration.

3. Evaluate – 50+ Metrics Production traces are automatically evaluated against your chosen metrics. Hallucination rates, groundedness scores, tool-use accuracy, PII leakage, and custom rubrics are all computed and tracked over time. Anomalies trigger alerts.

4. Simulate – Persona-Driven Edge Cases Based on evaluation results, the simulation engine generates targeted test scenarios. If your agent struggles with adversarial inputs, the simulator creates more of them. If voice agents fail on specific turn patterns, LiveKit-driven simulations reproduce those patterns at scale.

5. Optimize – 6 Prompt Optimization Algorithms Production traces and simulation results feed into the optimization engine. Six algorithms (GEPA, PromptWizard, ProTeGi, Bayesian, Meta-Prompt, and Random) automatically improve prompts, system instructions, and agent configurations. The best-performing variants are promoted.

6. Protect – 18 Scanners + 15 Adapters Guardrails run inline in the gateway, blocking PII, jailbreaks, prompt injections, and other threats in real time. The protection layer also feeds signal back into the optimization engine, creating a virtuous cycle where the agent becomes both safer and more effective over time.

The loop then returns to deployment, where the improved agent version replaces the previous one – and the cycle continues.

Installation

Quick Start with Docker Compose

      
        # Clone the repository
git clone https://github.com/future-agi/future-agi.git
cd future-agi

# Copy environment configuration
cp futureagi/.env.example futureagi/.env

# Start the full stack
docker compose up -d

Open http://localhost:3031 to access the dashboard.

Instrument Your First Agent (Python)

      
        from fi_instrumentation import register
from traceai_openai import OpenAIInstrumentor

register(project_name="my-agent")
OpenAIInstrumentor().instrument()

# Your existing OpenAI code is now traced
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": query}],
)

Instrument Your First Agent (TypeScript)

      
        import { register } from "@traceai/fi-core";
import { OpenAIInstrumentation } from "@traceai/openai";

register({ projectName: "my-agent" });
new OpenAIInstrumentation().instrument();

// Your existing OpenAI code is now traced
const response = await openai.chat.completions.create({
  model: "gpt-4o",
  messages: [{ role: "user", content: query }],
});

Cloud Option (No Install)

pip install ai-evaluation

Key Features

Feature	Description
Agent Simulation	Thousands of multi-turn conversations with realistic personas and adversarial inputs, including voice agents
50+ Evaluation Metrics	Groundedness, hallucination, tool-use correctness, PII, tone, custom rubrics via LLM-as-judge + heuristic + ML
18 Built-in Scanners	PII, jailbreak, injection detection, and more, plus 15 vendor adapters (Lakera, Presidio, Llama Guard)
OpenTelemetry Tracing	50+ framework instrumentors with span graphs, latency, token cost, and live dashboards
Agent Command Center	Go-based gateway with ~29k req/s, P99 under 21ms, 100+ providers, 15 routing strategies
Prompt Optimization	6 algorithms (GEPA, PromptWizard, ProTeGi, Bayesian, Meta-Prompt, Random) using production traces
Self-Improving Loop	Production traces feed back as training data for automatic prompt and agent improvement
Open Source	Apache 2.0 license – inspect every evaluator, prompt, and trace with no black-box scoring
Self-Hostable	Docker Compose, Kubernetes, or any container runtime (ECS, Cloud Run, AKS, EKS, GKE)

Comparison with Alternatives

Feature	Future AGI	Langfuse	Phoenix	Braintrust	Helicone
Open Source	Apache 2.0	MIT	Elastic v2	No	Apache 2.0
Self-Host	Yes	Yes	Yes	No	Yes
Agent Simulation	Yes	No	No	No	No
Voice Agent Eval	Yes	No	Limited	No	No
LLM Gateway	Yes (100+ providers)	No	No	Yes	Yes
Guardrails	Yes (18 + 15 adapters)	No	No	No	No
Prompt Optimization	Yes (6 algorithms)	No	No	No	No

SDKs and Integrations

Future AGI provides independently usable, Apache/MIT-licensed SDKs:

SDK	Install	Purpose
traceAI	`pip install fi-instrumentation-otel` / `npm i @traceai/fi-core`	Zero-config OTel tracing for 50+ AI frameworks
ai-evaluation	`pip install ai-evaluation` / `npm i @future-agi/ai-evaluation`	50+ evaluation metrics + guardrail scanners
futureagi	`pip install futureagi`	Platform SDK for datasets, prompts, KB, experiments
agent-opt	`pip install agent-opt`	6 prompt optimization algorithms
simulate-sdk	`pip install agent-simulate`	Voice-agent simulation via LiveKit + Silero VAD
agentcc	`pip install agentcc` / `npm i @agentcc/client`	Gateway client SDKs

Troubleshooting

Issue	Solution
Docker Compose fails to start	Ensure ports 3031, 5432, 8123, 6379, and 5672 are not in use by other services
traceAI not showing traces	Verify `register()` is called before any LLM client initialization
Gateway returns 401	Check that your API key is configured in the dashboard under Settings > API Keys
ClickHouse connection refused	Ensure ClickHouse is running: `docker compose ps clickhouse`
Evaluation scores are 0	Confirm your LLM judge API key is set in the environment configuration
High latency on gateway	Check routing strategy; weighted routing adds ~9.9ns overhead per request

Conclusion

Future AGI stands out as the only open-source platform that combines agent simulation, evaluation, guardrails, monitoring, gateway, and prompt optimization into a single self-improving feedback loop. With Apache 2.0 licensing, OpenTelemetry-native tracing, a Go-based gateway handling 29k req/s, and 50+ evaluation metrics out of the box, it eliminates the need to stitch together Langfuse, Braintrust, Helicone, and Guardrails AI separately.

Whether you are deploying customer support agents, voice assistants, RAG pipelines, or coding agents, Future AGI provides the infrastructure to simulate edge cases before launch, evaluate what happens in production, protect users in real time, and turn every trace into signal for the next version.

Links:

GitHub: https://github.com/future-agi/future-agi
Documentation: https://docs.futureagi.com
Cloud Dashboard: https://app.futureagi.com
Discord: https://discord.gg/UjZ2gRT5p

Enjoyed this post? Never miss out on future posts by following us

Future AGI: The Open-Source Platform for Self-Improving AI Agents

Future AGI: The Open-Source Platform for Self-Improving AI Agents

Understanding the Six-Pillar Architecture

Platform Stack

Understanding the Platform Stack

Evaluation Pipeline

Understanding the Evaluation Pipeline

Self-Improving Feedback Loop

Understanding the Self-Improving Feedback Loop

Installation

Quick Start with Docker Compose

Instrument Your First Agent (Python)

Instrument Your First Agent (TypeScript)

Cloud Option (No Install)

Key Features

Comparison with Alternatives

SDKs and Integrations

Troubleshooting

Conclusion

Related Posts

Learn Observability in a Single Post: A Complete Tutorial...

Claude Video: Give Claude Code the Ability to Watch and U...

AI Website Cloner: Clone Any Website With One Command Usi...

OfficeCLI: The First Office Suite Built for AI Agents to ...

Orca: Agent Development Environment for Running a Fleet o...

OmniRoute: Free AI Gateway With 231+ Providers and One Un...

Contents