What if your AI coding agent never ran out of context — and never sent a single byte of your code to the cloud? OpenMonoAgent.ai is an open-source, local-first terminal agent built on .NET 10 that bundles its own llama.cpp inference server, giving you a full agentic loop with 20 built-in tools, Docker sandboxing, and deep code intelligence. After the one-time setup, every token is free. Your code never leaves your machine.
OpenMono runs the model on your hardware via llama.cpp — an RTX 3090 or a workstation NUC is all you need. After setup, inference costs nothing. No account, no usage dashboard, no API key.
Key Features
- Zero-cost inference — llama.cpp ships bundled inside Docker; the installer detects your hardware and picks the right model automatically. GPU or CPU, every token is free after setup
- 20 built-in tools with a 12-step pipeline — every tool call goes through parse, schema validation, path sanity check, plan-mode guard, capability check, caching, pre-hook, execution, post-hook, and artifact storage. Nothing bypasses the pipeline
- 5 specialist sub-agents — Explore (read-only discovery, 15 turns), Plan (architecture, 10 turns), Coder (full file access, 30 turns), Verify (adversarial testing, 20 turns), and General-purpose (25 turns), each with locked tool sets and turn budgets
- Docker-native sandboxing — your project mounts as
/workspace; the agent can read and write real files but nothing outside that mount is visible or reachable - Deep code intelligence — LSP support for TypeScript, Python, Go, Rust, and C#; native Roslyn C# analysis with type hierarchy, blast-radius, and cross-file symbol search; auto-detects graphify and code-review-graph
- Playbooks — YAML workflows with typed parameters, conditional gates (Confirm/Review/Approve), checkpoint/resume, and composable step dependencies
- 4 hot-swappable providers — local llama.cpp (default, fully supported), OpenAI, Anthropic, and Ollama; switch mid-session with
/model - 192K token context window — with automatic checkpointing at 65% and compaction at 80% to keep the agent running indefinitely
- Doom-loop detection — aborts automatically if the same tool sequence repeats 3 times in a row
- Cross-session memory — persistent YAML-based memory store that carries context across sessions
How It Works
OpenMonoAgent is a .NET 10 CLI driving a local llama.cpp inference server over HTTP, everything sandboxed in Docker. The architecture follows a clean separation between the agentic loop and the inference backend.
The Agentic Loop
The ConversationLoop is the heart of the system. On startup, it builds a system prompt from base instructions, project-level OPENMONO.md, cross-session memory, and git context. Each turn follows this flow:
- Context management — if context usage exceeds 65%, the LLM generates a checkpoint summary; at 80%, compaction kicks in as a fallback
- LLM streaming — the agent streams tokens via SSE, dispatching thinking deltas, text deltas, and tool call deltas as they arrive
- Tool execution — tool calls go through the 12-step pipeline with schema validation, permission checks, hooks, and artifact storage
- Loop continuation — results are added to the session and fed back to the LLM for the next iteration (up to 25 iterations per turn)
The turn ends when the LLM produces text with no tool calls. The session is saved to JSONL for persistence and resumption.
The 12-Step Tool Pipeline
Every tool call is processed through a rigorous pipeline: parse JSON arguments, schema validation, sanity check (reject paths outside workspace), plan-mode guard, capability check via PermissionEngine, result cache lookup, pre-tool hook, execute, post-tool hook, artifact store (large results stored, reference returned), cache write, and file cache invalidation. Read-only tools run in parallel for speed.
Sub-Agent Architecture
Sub-agents spawn isolated sessions with restricted tool sets and dedicated system prompts. The Explore agent is read-only for discovery. The Plan agent handles architecture without writes. The Coder agent has full file access for implementation. The Verify agent performs adversarial testing with Roslyn and LSP. The parent session’s permission engine is reused across all sub-agents.
Getting Started
Install OpenMono with a single command:
bash <(curl -fsSL https://raw.githubusercontent.com/StartupHakk/OpenMonoAgent.ai/refs/heads/main/get-openmono.sh)
Then from any project directory:
cd your-project/
# TUI mode (default for interactive terminals)
openmono agent
# Classic scrolling terminal mode
openmono agent --classic
The installer detects your hardware and selects the optimal model:
| Hardware | Model | Speed |
|---|---|---|
| GPU 24 GB+ | Qwen3.6-27B-Q4_K_M | ~45-70 tok/s |
| GPU 16 GB | Qwen3.6-27B-UD-IQ3_XXS | ~20-42 tok/s |
| GPU 12 GB | Qwen3.5-9B-Q4_K_M | ~38-40 tok/s |
| CPU 24 GB RAM | Qwen3.6-35B-A3B-UD-Q4_K_XL | ~17-20 tok/s |
Switch models mid-session:
/model gpt-4o # OpenAI
/model claude-sonnet-4-20250514 # Anthropic
/model qwen3.6-27b # Back to local
Use playbooks for structured workflows:
/commit # Auto-generate conventional commit
/release minor # End-to-end release pipeline
/release patch --dry-run true # Dry-run release
Why OpenMonoAgent Matters
Most coding agents are cloud products with per-token billing and no ceiling on costs. Your prompts, your code, and your context hit someone else’s servers on every keystroke. OpenMonoAgent flips that model entirely.
AI shouldn’t be a subscription you rent. It should be infrastructure you own — sitting on your desk, serving your code, answering only to you.
The project delivers three fundamental shifts:
-
Economic — after one-time hardware setup, inference is genuinely free. No per-token billing, no usage caps, no surprise invoices. A 192K context window with automatic compaction means the agent can work on large codebases indefinitely without hitting context limits.
-
Privacy — fully offline capable. Your code never leaves your machine. No API keys, no cloud accounts, no data exfiltration risk. For teams working on proprietary codebases or in regulated industries, this is a hard requirement, not a nice-to-have.
-
Architectural depth — the 12-step tool pipeline, capability-based permission system, doom-loop detection, and sub-agent architecture represent a level of engineering rigor that most coding agents lack. Playbooks add structured, composable automation with typed parameters and human gates.
Built on .NET 10 with a Spectre.Console-powered TUI, OpenMonoAgent is a serious engineering tool for developers who want AI-assisted coding without the cloud dependency. With 765 stars and active development, it’s one of the most promising local-first coding agents in the open-source ecosystem. Enjoyed this post? Never miss out on future posts by following us