CodeGraph: Pre-Indexed Code Knowledge Graph for AI Coding Agents

CodeGraph is a pre-indexed code knowledge graph that supercharges AI coding agents like Claude Code, Cursor, Codex CLI, and opencode with semantic code intelligence. Instead of agents spending dozens of tool calls scanning files with grep, glob, and Read, CodeGraph provides instant access to symbol relationships, call graphs, and code structure through a local SQLite database. The result: 94% fewer tool calls and 77% faster exploration across real-world codebases.

The Problem: Wasteful Code Exploration

When Claude Code explores an unfamiliar codebase, it spawns Explore agents that scan files using grep, glob, and Read tool calls. Each tool call consumes tokens and time. For a large project like VS Code, answering a single architecture question can require 52 tool calls and 1 minute 37 seconds of exploration time.

The core issue is that AI agents lack structural knowledge of the codebase. They must discover file layouts, trace function calls, and map class relationships from scratch every time. This repetitive discovery process wastes context window capacity and slows down every coding task.

Key Insight: CodeGraph benchmarks show that on the VS Code codebase (4,002 files, 59,377 nodes), an Explore agent answered the same architecture question with just 3 tool calls and 17 seconds – a 94% reduction in tool calls and 82% faster completion.

How CodeGraph Works

CodeGraph builds a semantic knowledge graph of your codebase using tree-sitter AST parsing, stores it in a local SQLite database with FTS5 full-text search, and exposes it to AI agents through an MCP server with 8 specialized tools.

CodeGraph Architecture

Understanding the Architecture

The architecture follows a layered pipeline design where source code flows through extraction, resolution, storage, and query layers before reaching AI agents via the MCP server.

Layer 1: ExtractionOrchestrator

The extraction layer uses tree-sitter to parse source code into Abstract Syntax Trees (ASTs). Language-specific queries then extract nodes (functions, classes, methods, imports) and edges (calls, imports, extends, implements) from the AST. CodeGraph supports 22 NodeKinds (file, module, class, struct, interface, trait, protocol, function, method, property, field, variable, constant, enum, enum_member, type_alias, namespace, parameter, import, export, route, component) and 12 EdgeKinds (contains, calls, imports, exports, extends, implements, references, type_of, returns, instantiates, overrides, decorates).

Layer 2: ReferenceResolver

After extraction, the resolution layer connects the dots: function calls link to their definitions, imports resolve to source files, class inheritance chains are established, and framework-specific patterns are detected. The resolver handles import resolution with path-alias support (tsconfig paths, Cargo workspace member globs) and name matching across modules.

Layer 3: Framework Detection

CodeGraph recognizes web-framework routing files and emits route nodes linked by references edges to their handler classes or functions. This means querying callers of a view/controller surfaces the URL pattern that binds it. Supported frameworks include Django, Flask, FastAPI, Express, Laravel, Rails, Spring, Gin, Axum, ASP.NET, Vapor, React Router, and SvelteKit.

Layer 4: SQLite Storage

Everything goes into a local SQLite database (.codegraph/codegraph.db) with FTS5 full-text search. The database uses better-sqlite3 (native) when available and transparently falls back to node-sqlite3-wasm for environments without native bindings. No data ever leaves your machine.

Layer 5: Auto-Sync

The MCP server watches your project using native OS file events (FSEvents on macOS, inotify on Linux, ReadDirectoryChangesW on Windows). Changes are debounced with a 2-second quiet window, filtered to source files only, and incrementally synced. The graph stays fresh as you code with zero configuration.

Takeaway: With just npx @colbymchenry/codegraph and codegraph init -i, your AI agents gain instant structural knowledge of your entire codebase – no manual configuration, no API keys, no external services.

Benchmark Results

Tested across 6 real-world codebases comparing Claude Code’s Explore agent with and without CodeGraph:

Codebase	With CodeGraph	Without CodeGraph	Improvement
VS Code (TypeScript)	3 calls, 17s	52 calls, 1m 37s	94% fewer, 82% faster
Excalidraw (TypeScript)	3 calls, 29s	47 calls, 1m 45s	94% fewer, 72% faster
Claude Code (Python+Rust)	3 calls, 39s	40 calls, 1m 8s	93% fewer, 43% faster
Claude Code (Java)	1 call, 19s	26 calls, 1m 22s	96% fewer, 77% faster
Alamofire (Swift)	3 calls, 22s	32 calls, 1m 39s	91% fewer, 78% faster
Swift Compiler (Swift/C++)	6 calls, 35s	37 calls, 2m 8s	84% fewer, 73% faster

Amazing: The Swift Compiler benchmark tested the largest codebase (25,874 files, 272,898 nodes) – CodeGraph indexed it in under 4 minutes and the agent answered a complex cross-cutting question with 6 explore calls and zero file reads in 35 seconds.

Key observations from the benchmarks:

With CodeGraph, agents never fell back to reading files – they trusted the graph results completely
Without CodeGraph, agents spent most time on discovery (find, ls, grep) before reading relevant code
Cross-language queries (Python+Rust) worked seamlessly – graph traversal found connections across language boundaries
The Alamofire benchmark traced a 9-step call chain from Session.request() to URLSession.dataTask() in a single explore call

MCP Tools

When running as an MCP server, CodeGraph exposes 8 tools to AI coding agents:

CodeGraph MCP Tools

Understanding the MCP Tools

CodeGraph’s 8 MCP tools are organized into four categories: Search, Navigation, Impact Analysis, and Context Building. Each tool queries the local SQLite knowledge graph and returns structured results instantly.

Search Tools

Tool	Purpose
`codegraph_search`	Find symbols by name across the entire codebase using FTS5 full-text search
`codegraph_files`	Get indexed file structure – faster than filesystem scanning
`codegraph_status`	Check index health, statistics, and which SQLite backend is active

Navigation Tools

Tool	Purpose
`codegraph_callers`	Find what calls a function – trace incoming call chains
`codegraph_callees`	Find what a function calls – trace outgoing call chains
`codegraph_node`	Get details about a specific symbol, optionally with source code

Impact Analysis

Tool	Purpose
`codegraph_impact`	Analyze what code is affected by changing a symbol – essential before refactoring

Context Building

Tool	Purpose
`codegraph_context`	Build relevant code context for a task – returns entry points, related symbols, and code snippets in one call

The codegraph_context tool is the most powerful. It combines search, navigation, and code retrieval into a single call that returns everything an agent needs to understand a code area. This is what replaces the 40-50 tool calls that agents normally make during exploration.

Important: The main Claude Code session should only use lightweight tools (search, callers, callees, impact, node) for targeted lookups. For exploration questions, always spawn an Explore agent that uses codegraph_context as its primary tool – this prevents large code sections from filling up the main session context.

Supported Languages

CodeGraph supports 19+ languages through tree-sitter grammar parsing:

Category	Languages
Web/Scripting	TypeScript, JavaScript, Python, Ruby, PHP, Dart, Svelte, Liquid
Systems	Go, Rust, C, C++
JVM	Java, Kotlin
Apple	Swift
.NET	C#
Other	Pascal/Delphi

Framework-aware route detection works across 13 frameworks: Django, Flask, FastAPI, Express, Laravel, Rails, Spring, Gin, chi, gorilla/mux, Axum, actix, Rocket, ASP.NET, Vapor, React Router, and SvelteKit.

Installation

Quick Install (Recommended)

npx @colbymchenry/codegraph

The interactive installer will:

Ask which agent(s) to configure – auto-detects installed ones from Claude Code, Cursor, Codex CLI, opencode
Prompt to install codegraph on your PATH (so agents can launch the MCP server)
Ask whether configs apply to all your projects or just this one
Write each chosen agent’s MCP server config + instructions file (e.g., CLAUDE.md, .cursor/rules/codegraph.mdc, ~/.codex/AGENTS.md)
Set up auto-allow permissions when Claude Code is one of the targets
Initialize your current project (local installs only)

Non-Interactive Install (CI/Scripting)

      
        # Auto-detect agents, install global
codegraph install --yes

# Explicit target list
codegraph install --target=cursor,claude --yes

# Detected agents, project-local
codegraph install --target=auto --location=local

# Print config snippet without writing files
codegraph install --print-config codex

Flag	Values	Default
`--target`	`auto`, `all`, `none`, or csv (`claude,cursor,...`)	prompt
`--location`	`global`, `local`	prompt
`--yes`	(boolean)	prompt every step
`--no-permissions`	(boolean) skip Claude auto-allow list	permissions on
`--print-config <id>`	dump snippet for one agent and exit	–

Initialize Projects

      
        cd your-project
codegraph init -i

This builds the per-project knowledge graph index. It also wires up any project-local agent surfaces (e.g., Cursor’s .cursor/rules/codegraph.mdc) so a single global codegraph install works in every project you open.

Restart Your Agent

Restart your agent (Claude Code / Cursor / Codex CLI / opencode) for the MCP server to load. Your agent will use CodeGraph tools automatically when a .codegraph/ directory exists.

CLI Reference

      
    
      
        codegraph                         # Run interactive installer
codegraph install                 # Run installer (explicit)
codegraph init [path]             # Initialize in a project (--index to also index)
codegraph uninit [path]           # Remove CodeGraph from a project (--force to skip prompt)
codegraph index [path]            # Full index (--force to re-index, --quiet for less output)
codegraph sync [path]             # Incremental update
codegraph status [path]           # Show statistics
codegraph query <search>          # Search symbols (--kind, --limit, --json)
codegraph files [path]            # Show file structure (--format, --filter, --max-depth, --json)
codegraph context <task>          # Build context for AI (--format, --max-nodes)
codegraph affected [files...]     # Find test files affected by changes
codegraph serve --mcp             # Start MCP server

      
      
        

Affected Files for CI

The codegraph affected command traces import dependencies transitively to find which test files are affected by changed source files:

      
        # Pass files as arguments
codegraph affected src/utils.ts src/api.ts

# Pipe from git diff
git diff --name-only | codegraph affected --stdin

# Custom test file pattern
codegraph affected src/auth.ts --filter "e2e/*"

Option	Description	Default
`--stdin`	Read file list from stdin	`false`
`-d, --depth <n>`	Max dependency traversal depth	`5`
`-f, --filter <glob>`	Custom glob to identify test files	auto-detect
`-j, --json`	Output as JSON	`false`
`-q, --quiet`	Output file paths only	`false`

CI/hook example:

      
        #!/usr/bin/env bash
AFFECTED=$(git diff --name-only HEAD | codegraph affected --stdin --quiet)
if [ -n "$AFFECTED" ]; then
  npx vitest run $AFFECTED
fi

Library Usage

CodeGraph can also be used as a TypeScript library:

      
        import CodeGraph from '@colbymchenry/codegraph';

const cg = await CodeGraph.init('/path/to/project');
// Or: const cg = await CodeGraph.open('/path/to/project');

await cg.indexAll({
  onProgress: (p) => console.log(`${p.phase}: ${p.current}/${p.total}`)
});

const results = cg.searchNodes('UserService');
const callers = await cg.getCallers('UserService.login');
const impact = await cg.getImpactRadius('UserService.login');
const context = await cg.buildContext('implement user authentication');

await cg.watch();  // Auto-sync on file changes
await cg.close();

Key Features Summary

Feature	Description
Smart Context Building	One tool call returns entry points, related symbols, and code snippets
Full-Text Search	Find code by name instantly across your entire codebase, powered by FTS5
Impact Analysis	Trace callers, callees, and the full impact radius of any symbol
Always Fresh	File watcher uses native OS events with debounced auto-sync
19+ Languages	TypeScript, JavaScript, Python, Go, Rust, Java, C#, PHP, Ruby, C, C++, Swift, Kotlin, Dart, Svelte, Liquid, Pascal/Delphi
Framework-aware Routes	Recognizes web-framework routing files and links URL patterns to handlers across 13 frameworks
100% Local	No data leaves your machine. No API keys. No external services. SQLite database only
Multi-Agent Support	Works with Claude Code, Cursor, Codex CLI, and opencode

Troubleshooting

Issue	Solution
MCP server not loading	Restart your agent after running the installer. Check that `codegraph` is on your PATH.
Index not updating	Run `codegraph status` to check index health. Use `codegraph sync` for manual sync or `codegraph index --force` for a full re-index.
Cursor working directory issue	The installer injects `--path` into Cursor’s MCP args to handle Cursor’s cwd quirk. Re-run the installer if you moved your project.
Node version error	CodeGraph requires Node.js 18+ and blocks Node 25.x. Check your Node version with `node --version`.
WASM fallback slow	If `better-sqlite3` native binding is unavailable, CodeGraph falls back to `node-sqlite3-wasm` which is slower. Install build tools for native compilation.

Conclusion

CodeGraph solves a fundamental problem in AI-assisted coding: the wasteful exploration loop. By pre-indexing codebases into a semantic knowledge graph with tree-sitter, it gives AI agents instant structural understanding that replaces dozens of file-scanning tool calls with a single graph query. The benchmarks speak for themselves – 94% fewer tool calls and 77% faster exploration across real-world codebases like VS Code, Excalidraw, and the Swift Compiler. With 19+ language support, 13 framework-aware route detectors, 100% local processing, and automatic file watching, CodeGraph is a zero-configuration productivity multiplier for any developer using AI coding agents.

Repository: https://github.com/colbymchenry/codegraph

npm: @colbymchenry/codegraph

License: MIT

Enjoyed this post? Never miss out on future posts by following us

CodeGraph: Pre-Indexed Code Knowledge Graph for AI Coding Agents

CodeGraph: Pre-Indexed Code Knowledge Graph for AI Coding Agents

The Problem: Wasteful Code Exploration

How CodeGraph Works

Understanding the Architecture

Benchmark Results

MCP Tools

Understanding the MCP Tools

Supported Languages

Installation

Quick Install (Recommended)

Non-Interactive Install (CI/Scripting)

Initialize Projects

Restart Your Agent

CLI Reference

Affected Files for CI

Library Usage

Key Features Summary

Troubleshooting

Conclusion

Related Posts

AI Auto-Work: Dual-Model Agentic Coding Workflow with Cla...

DESIGN.md: Google's Visual Identity Specification for AI ...

Open Codesign: Open-Source Multi-Model Design Agent

Oh-My-Codex: A Workflow Layer for OpenAI Codex CLI

Awesome Codex Skills: Curated Workflow Automation for Ope...

OpenMetadata: Unified Metadata Platform for Data Discover...

Contents