Free Claude Code: Use Claude Code CLI and VSCode for Free with NVIDIA NIM, OpenRouter, and Local Models

Free Claude Code is a lightweight proxy that routes Claude Code’s Anthropic API calls to free or low-cost alternatives including NVIDIA NIM (40 requests per minute free), OpenRouter (hundreds of free and paid models), DeepSeek (direct API), LM Studio (fully local), and llama.cpp (local inference). With 9,833+ stars and growing, this open-source project enables developers to use Claude Code’s powerful CLI and VSCode extension without requiring an Anthropic API key or paid subscription.

Free Claude Code Architecture

Understanding the Free Claude Code Architecture

The architecture diagram above shows how Free Claude Code acts as a transparent proxy between Claude Code and various LLM providers. Let’s examine each component:

Component 1: Claude Code Client The Claude Code CLI or VSCode extension sends standard Anthropic API requests in Server-Sent Events (SSE) format. The client believes it is communicating directly with Anthropic’s servers, but the requests are intercepted by the proxy.

Component 2: Free Claude Code Proxy The proxy server runs on localhost (default port 8082) and implements Claude-compatible API endpoints including GET /v1/models, POST /v1/messages, POST /v1/messages/count_tokens, plus HEAD and OPTIONS support for common probe endpoints. The proxy handles request detection, model routing, format translation, and response streaming.

Component 3: LLM Providers The proxy supports five provider backends:

NVIDIA NIM: 40 requests per minute free tier, recommended for daily use
OpenRouter: Access to hundreds of models including free tiers from various providers
DeepSeek: Direct API access to DeepSeek chat and reasoner models
LM Studio: Fully local inference with no API key required
llama.cpp: Lightweight local inference engine via llama-server

Data Flow: Claude Code sends Anthropic-format requests to the proxy. The proxy detects trivial requests (quota probes, title generation, prefix detection, suggestions, filepath extraction) and responds locally without consuming API quota. Non-trivial requests are routed to the appropriate provider based on model mapping (Opus, Sonnet, Haiku, or fallback). The provider response is translated back to Anthropic format and streamed to Claude Code.

Request Flow: How Requests Are Processed

Free Claude Code Request Flow

Step 1: Request Detection

When Claude Code sends a request, the proxy first checks if it is a trivial request that can be handled locally. Five categories of trivial requests are intercepted:

Quota probes and health checks
Title generation requests
Prefix detection requests
Suggestion mode requests
Filepath extraction requests

These local responses save API quota and reduce latency for common operations.

Step 2: Model Routing

For non-trivial requests, the proxy determines which model to use based on the request type:

Opus requests route to MODEL_OPUS
Sonnet requests route to MODEL_SONNET
Haiku requests route to MODEL_HAIKU
Unrecognized models fall back to MODEL

Each model variable uses the format provider_prefix/model/name, allowing different providers for different model tiers.

Step 3: Format Translation

The proxy translates between Anthropic Messages format and OpenAI chat format depending on the provider:

Native Anthropic providers: LM Studio and llama.cpp use native Anthropic Messages endpoints
OpenAI-compatible providers: NVIDIA NIM and DeepSeek use shared OpenAI chat translation
OpenRouter: Supports both formats depending on the selected model

Step 4: Response Streaming

Provider responses are translated back to Anthropic SSE format and streamed to Claude Code in real-time. When ENABLE_THINKING=true, thinking tokens from reasoning_content fields and ` ` tags are converted into native Claude thinking blocks.

Provider Comparison

Free Claude Code Providers

NVIDIA NIM (Recommended)

NVIDIA NIM offers a generous free tier with 40 requests per minute, making it ideal for daily development work. Popular models include MiniMax-M2.5, Qwen3.5, GLM-5, Kimi-K2.5, and Step-3.5-Flash. No credit card required for the free tier.

OpenRouter

OpenRouter provides access to hundreds of models from various providers, including free tiers. This is useful when you need model variety or fallback options. Free models include Arcee Trinity, Step-3.5-Flash, DeepSeek-R1, and GPT-OSS-120B.

DeepSeek

DeepSeek offers direct API access to their chat and reasoner models. This is ideal if you specifically want DeepSeek’s capabilities or prefer their pricing model over other providers.

LM Studio (Fully Local)

LM Studio enables completely local inference with no API key required and no rate limits. Load models like LiquidAI LFM2, MiniMax-M2.5, GLM-4.7-Flash, or Qwen3.5 in GGUF format. Best for privacy-sensitive work or offline development.

llama.cpp (Lightweight Local)

llama.cpp provides a lightweight local inference engine via llama-server. Ensure you have a tool-capable GGUF model loaded. This option is ideal for resource-constrained environments or when you want minimal overhead.

Key Features

Zero-Cost Operation

With NVIDIA NIM’s 40 req/min free tier and OpenRouter’s free models, you can use Claude Code for daily development without spending anything. Local options (LM Studio, llama.cpp) require only your own hardware.

Drop-in Replacement

Free Claude Code requires only two environment variables:

ANTHROPIC_BASE_URL: Point to the proxy (e.g., http://localhost:8082)
ANTHROPIC_AUTH_TOKEN: Optional authentication token

No modifications to Claude Code CLI or VSCode extension are needed.

Per-Model Mapping

Route Opus, Sonnet, and Haiku requests to different models and providers. Mix providers freely - for example, use NVIDIA NIM for Opus, OpenRouter for Sonnet, and LM Studio for Haiku.

Thinking Token Support

The proxy parses reasoning_content fields and ` ` tags from provider responses and converts them into native Claude thinking blocks when `ENABLE_THINKING=true`.

Heuristic Tool Parser

Models that output tool calls as text are automatically parsed into structured tool use, enabling tool-capable models that don’t natively support Anthropic’s tool format.

Smart Rate Limiting

Proactive rolling-window throttling plus reactive 429 exponential backoff with optional concurrency cap (PROVIDER_MAX_CONCURRENCY) prevents rate limit violations.

Subagent Control

Task tool interception forces run_in_background=False, preventing runaway subagents from consuming excessive resources.

Installation and Setup

Prerequisites

Get an API key for your chosen provider:
- NVIDIA NIM: build.nvidia.com/settings/api-keys
- OpenRouter: openrouter.ai/keys
- DeepSeek: platform.deepseek.com/api_keys
- LM Studio: No API key needed - download from lmstudio.ai
- llama.cpp: No API key needed - run llama-server locally
Install Claude Code from Anthropic’s repository

Quick Install

      
        # Install uv package manager
pip install uv

# Clone the repository
git clone https://github.com/Alishahryar1/free-claude-code.git
cd free-claude-code

# Copy environment template
cp .env.example .env

Configure Your Provider

Edit .env with your chosen provider:

NVIDIA NIM (recommended):

NVIDIA_NIM_API_KEY="nvapi-your-key-here"
MODEL="nvidia_nim/z-ai/glm4.7"
ENABLE_THINKING=true

OpenRouter:

OPENROUTER_API_KEY="sk-or-your-key-here"
MODEL_OPUS="open_router/deepseek/deepseek-r1-0528:free"
MODEL_SONNET="open_router/openai/gpt-oss-120b:free"
MODEL_HAIKU="open_router/stepfun/step-3.5-flash:free"
MODEL="open_router/stepfun/step-3.5-flash:free"

LM Studio (local):

MODEL_OPUS="lmstudio/unsloth/MiniMax-M2.5-GGUF"
MODEL_SONNET="lmstudio/unsloth/Qwen3.5-35B-A3B-GGUF"
MODEL_HAIKU="lmstudio/unsloth/GLM-4.7-Flash-GGUF"
MODEL="lmstudio/unsloth/GLM-4.7-Flash-GGUF"

Mix providers:

NVIDIA_NIM_API_KEY="nvapi-your-key-here"
OPENROUTER_API_KEY="sk-or-your-key-here"
MODEL_OPUS="nvidia_nim/moonshotai/kimi-k2.5"
MODEL_SONNET="open_router/deepseek/deepseek-r1-0528:free"
MODEL_HAIKU="lmstudio/unsloth/GLM-4.7-Flash-GGUF"
MODEL="nvidia_nim/z-ai/glm4.7"

Run the Proxy

Terminal 1 - Start the proxy server:

uv run uvicorn server:app --host 0.0.0.0 --port 8082

Terminal 2 - Run Claude Code:

PowerShell:

      
        $env:ANTHROPIC_AUTH_TOKEN="freecc"; $env:ANTHROPIC_BASE_URL="http://localhost:8082"; claude

Bash:

      
        ANTHROPIC_AUTH_TOKEN="freecc" ANTHROPIC_BASE_URL="http://localhost:8082" claude

VSCode Extension Setup

Start the proxy server
Open VSCode Settings (Ctrl + ,) and search for claude-code.environmentVariables
Click Edit in settings.json and add:

      
        "claudeCode.environmentVariables": [
  { "name": "ANTHROPIC_BASE_URL", "value": "http://localhost:8082" },
  { "name": "ANTHROPIC_AUTH_TOKEN", "value": "freecc" }
]

Reload extensions
If you see the login screen, click Anthropic Console, then authorize. The extension will start working.

Discord and Telegram Bot

Free Claude Code Discord Bot

Free Claude Code includes a messaging platform integration that enables remote autonomous coding via Discord or Telegram.

Capabilities

Tree-based message threading: Reply to a message to fork the conversation
Session persistence: Sessions survive server restarts
Live streaming: Real-time thinking tokens, tool calls, and results
Unlimited concurrent sessions: Controlled by PROVIDER_MAX_CONCURRENCY
Voice notes: Send voice messages that are transcribed and processed as prompts
Commands: /stop (cancel task), /clear (reset sessions), /stats

Discord Setup

Create a bot at the Discord Developer Portal
Enable Message Content Intent under Bot settings
Add to .env:

MESSAGING_PLATFORM="discord"
DISCORD_BOT_TOKEN="your_discord_bot_token"
ALLOWED_DISCORD_CHANNELS="123456789,987654321"
CLAUDE_WORKSPACE="./agent_workspace"
ALLOWED_DIR="C:/Users/yourname/projects"

Start the server and invite the bot with OAuth2 URL Generator (scopes: bot, permissions: Read Messages, Send Messages, Manage Messages, Read Message History)

Voice Notes

Voice messages are transcribed using Hugging Face Whisper (default, free and offline) or NVIDIA NIM. Install voice extras:

      
        # Local Whisper
uv sync --extra voice_local

# NVIDIA NIM voice
uv sync --extra voice

Configuration Reference

Core Variables

Variable	Description	Default
`MODEL`	Fallback model (`provider/model/name`)	`nvidia_nim/z-ai/glm4.7`
`MODEL_OPUS`	Model for Claude Opus requests	empty (falls back to MODEL)
`MODEL_SONNET`	Model for Claude Sonnet requests	empty (falls back to MODEL)
`MODEL_HAIKU`	Model for Claude Haiku requests	empty (falls back to MODEL)
`ENABLE_THINKING`	Enable reasoning/thinking blocks	`true`

Rate Limiting

Variable	Description	Default
`PROVIDER_RATE_LIMIT`	Requests per window	`40`
`PROVIDER_RATE_WINDOW`	Window in seconds	`60`
`PROVIDER_MAX_CONCURRENCY`	Max simultaneous streams	`5`

Request Optimization (enabled by default)

Variable	Description
`FAST_PREFIX_DETECTION`	Enable fast prefix detection
`ENABLE_NETWORK_PROBE_MOCK`	Mock network probe requests
`ENABLE_TITLE_GENERATION_SKIP`	Skip title generation
`ENABLE_SUGGESTION_MODE_SKIP`	Skip suggestion mode
`ENABLE_FILEPATH_EXTRACTION_MOCK`	Mock filepath extraction

Extending Free Claude Code

Adding a New Provider

Extend OpenAIChatTransport for OpenAI-compatible providers:

      
        from providers.openai_compat import OpenAIChatTransport
from providers.base import ProviderConfig

class MyProvider(OpenAIChatTransport):
    def __init__(self, config: ProviderConfig):
        super().__init__(config, provider_name="MYPROVIDER",
                         base_url="https://api.example.com/v1", api_key=config.api_key)

Adding a Messaging Platform

Extend MessagingPlatform and implement:

start(): Initialize the platform connection
stop(): Clean up resources
send_message(): Send a message to a channel
edit_message(): Edit an existing message
on_message(): Handle incoming messages

Project Structure

      
        free-claude-code/
├── server.py              # Entry point
├── api/                   # FastAPI routes, model routing, optimizations
├── core/                  # Anthropic protocol helpers, SSE, parsers
├── providers/             # Provider registry, transports
├── messaging/             # Discord/Telegram bots, voice, sessions
├── config/                # Settings, logging
├── cli/                   # CLI session management
└── tests/                 # Pytest test suite

Conclusion

Free Claude Code democratizes access to Claude Code’s powerful interface by enabling free and local alternatives to Anthropic’s API. Whether you choose NVIDIA NIM’s generous free tier, OpenRouter’s model variety, DeepSeek’s direct API, or fully local inference with LM Studio or llama.cpp, you can enjoy Claude Code’s agentic coding capabilities without subscription costs. The Discord/Telegram bot integration extends this accessibility to remote and collaborative workflows, making Free Claude Code a versatile addition to any developer’s toolkit.

Free Claude Code: Use Claude Code CLI and VSCode for Free with NVIDIA NIM, OpenRouter, and Local Models

Free Claude Code: Use Claude Code CLI and VSCode for Free with NVIDIA NIM, OpenRouter, and Local Models

Understanding the Free Claude Code Architecture

Request Flow: How Requests Are Processed

Step 1: Request Detection

Step 2: Model Routing

Step 3: Format Translation

Step 4: Response Streaming

Provider Comparison

NVIDIA NIM (Recommended)

OpenRouter

DeepSeek

LM Studio (Fully Local)

llama.cpp (Lightweight Local)

Key Features

Zero-Cost Operation

Drop-in Replacement

Per-Model Mapping

Thinking Token Support

Heuristic Tool Parser

Smart Rate Limiting

Subagent Control

Installation and Setup

Prerequisites

Quick Install

Configure Your Provider

Run the Proxy

VSCode Extension Setup

Discord and Telegram Bot

Capabilities

Discord Setup

Voice Notes

Configuration Reference

Core Variables

Rate Limiting

Request Optimization (enabled by default)

Extending Free Claude Code

Adding a New Provider

Adding a Messaging Platform

Project Structure

Conclusion

Links

Related guides

Related Posts

SEO Machine: AI-Powered Content Creation Workspace

CLI-Anything: Making ALL Software Agent-Native with Autom...

AGENTS.md: Anti-Sycophancy Operating Instructions That Ma...

Scrapling: Adaptive Web Scraping with AI Element Tracking...

Claude HowTo: Master Claude Code from Beginner to Power User

AI Website Cloner: Clone Any Website With One Command Usi...

Contents