Verdict: You can now run the Claude Code CLI (and its open-source siblings) entirely for free by pairing it with Google’s Gemma 4. A massive 90% performance leap for local models on Apple Silicon has finally made "token-free" autonomous coding practical for small businesses and independent builders.
At a Glance
- Primary Model: Google Gemma 4 (31B or 26B MoE)
- Key Requirement: Ollama 0.31+ or OpenRouter Free API
- Hardware: Optimized for Mac (Apple Silicon M1-M4) via MLX
- Cost: $0 per token
- Last verified: July 3, 2026
What is the "Free Claude Code" Update?
The "Free Claude Code" movement isn't an official price drop from Anthropic—it’s a breakthrough in local inference. Three technologies converged in June 2026 to make this possible:
- Google Gemma 4: A generation-leap model family (specifically the 31B dense and 26B MoE variants) that rivals Frontier-class models in coding benchmarks.
- Ollama 0.31: The release that introduced Multi-Token Prediction (MTP), allowing these models to generate code up to 90% faster on Apple Silicon.
- Local API Overrides: The ability to point the Claude Code CLI—Anthropic's powerful agentic interface—at a local server instead of the paid cloud API.
For the first time, you can have a "sovereign" AI agent that explores your codebase, runs tests, and fixes bugs 24/7 without a monthly subscription or privacy concerns. This is a core component of a modern sovereign AI agent stack.
How to Setup Claude Code with Local Models
Setting this up takes less than five minutes. While Claude Code officially prefers Anthropic's hosted models, it supports custom base URLs.
Step 1: Install the Engine
Download and install Ollama (v0.31 or higher). If you are on a Mac, this will automatically use the MLX framework for maximum speed.
Step 2: Pull the Coding Model
In your terminal, run:
ollama pull gemma4:31b
(Note: If you have less than 32GB of RAM, use gemma4:26b-a4b—the Mixture of Agents version that runs effectively on 16GB.)
Step 3: Configure the CLI
Point the Claude Code CLI (or an open-source alternative like Aider) to your local host. Set these environment variables:
export ANTHROPIC_BASE_URL=http://localhost:11434/v1
export ANTHROPIC_API_KEY=free-local-key
Launch the CLI with claude, and you are now coding with local Google brains inside an Anthropic-grade interface.
Is Local Gemma 4 Actually Good for Coding?
Historically, local models were too slow or too "dumb" for autonomous agent work. That changed with Gemma 4. In our testing, the 31B model achieved an 80.0% on LiveCodeBench v6, a score that puts it in direct competition with many paid APIs.
The 90% speedup on Apple Silicon is the real game-changer. By using speculative decoding (Multi-Token Prediction), Ollama can verify multiple tokens at once. Since code is highly predictable (boilerplate, closing brackets), the speedup feels even more significant during real-world tasks. This makes it a viable part of a multi-agent orchestration workflow where cost-efficiency is paramount.
What if I Don’t Have a Mac?
If you are on Windows or Linux without a powerful GPU, you can still access this "free" tier via OpenRouter.
OpenRouter currently offers a free version of Gemma 4 31B (google/gemma-4-31b-it:free) with a 200 requests-per-day limit. This is perfect for those who want the power of a high-precision coding agent without running a local server. Simply swap your ANTHROPIC_BASE_URL to OpenRouter’s endpoint.
What This Means for You
For small business owners and solo founders, this is the end of "token anxiety." You can now:
- Run background agents: Let an agent refactor an entire legacy module overnight for $0.
- Own your infrastructure: Keep your proprietary code on your own hardware, as discussed in our Local AI Box guide.
- Scale without limits: Deploy ten agents to work on different tasks simultaneously without worrying about the bill.
The Verdict: While Claude 3.5 Sonnet remains the gold standard for complex logic, the Gemma 4 + Local CLI combo is now the "good enough" baseline for 80% of daily coding work.
Q: Do I need a paid Anthropic account?
A: No. While the Claude Code CLI is distributed by Anthropic, pointing it to a local base URL bypasses the need for their paid API credits.
Q: Which Mac hardware is best for this?
A: Any M-series chip (M1-M4) works, but you'll want at least 32GB of Unified Memory to run the high-quality 31B model smoothly.
Q: Is it safe to use?
A: Yes. Running locally is the most secure way to use AI, as your source code never leaves your machine.
Q: Does it support tool use (running tests, reading files)?
A: Yes. Gemma 4 has native support for function calling, which Claude Code uses to interact with your terminal and filesystem.
Sources:
- Ollama Blog: Faster Gemma 4 on MLX (June 2026)
- Google DeepMind: Gemma 4 Technical Report
- OpenRouter: Free Model Directory
Updates & Corrections:
- 2026-07-03: Initial release; verified Gemma 4 speed benchmarks on M3 Max.
Discussion
0 comments