The Tech ArchiveThe Tech ArchiveThe Tech Archive
Small BusinessMarketingDevelopers
ArticlesTopicsSeriesAbout

Get the practical AI brief

Verified, no-hype AI tips you can actually use - in your inbox. Free.

No spam. We verify what we send. Unsubscribe anytime.

The Tech ArchiveThe Tech Archive

The Tech Archive

AI news, analysis & explainers

AboutSmall BusinessMarketingDevelopersArticlesTopicsSeriesMethodologyAI DisclosureCorrections

© 2026 All rights reserved.

Back to home
0 readers reading
  1. Home
  2. Articles
  3. AI for Small Business
  4. Free Claude Code: How to Run Google Gemma 4 Locally (90% Faster)

Contents

Free Claude Code: How to Run Google Gemma 4 Locally (90% Faster)
AI for Small Business

Free Claude Code: How to Run Google Gemma 4 Locally (90% Faster)

Run high-performance AI coding locally for free. Google’s Gemma 4 90% speed boost on Apple Silicon makes Claude Code CLI fully autonomous without token costs.

Sham

Sham

AI Engineer & Founder, The Tech Archive

5 min read
0 views
July 3, 2026

Verdict: You can now run the Claude Code CLI (and its open-source siblings) entirely for free by pairing it with Google’s Gemma 4. A massive 90% performance leap for local models on Apple Silicon has finally made "token-free" autonomous coding practical for small businesses and independent builders.

At a Glance

  • Primary Model: Google Gemma 4 (31B or 26B MoE)
  • Key Requirement: Ollama 0.31+ or OpenRouter Free API
  • Hardware: Optimized for Mac (Apple Silicon M1-M4) via MLX
  • Cost: $0 per token
  • Last verified: July 3, 2026

What is the "Free Claude Code" Update?

The "Free Claude Code" movement isn't an official price drop from Anthropic—it’s a breakthrough in local inference. Three technologies converged in June 2026 to make this possible:

  1. Google Gemma 4: A generation-leap model family (specifically the 31B dense and 26B MoE variants) that rivals Frontier-class models in coding benchmarks.
  2. Ollama 0.31: The release that introduced Multi-Token Prediction (MTP), allowing these models to generate code up to 90% faster on Apple Silicon.
  3. Local API Overrides: The ability to point the Claude Code CLI—Anthropic's powerful agentic interface—at a local server instead of the paid cloud API.

For the first time, you can have a "sovereign" AI agent that explores your codebase, runs tests, and fixes bugs 24/7 without a monthly subscription or privacy concerns. This is a core component of a modern sovereign AI agent stack.

How to Setup Claude Code with Local Models

Setting this up takes less than five minutes. While Claude Code officially prefers Anthropic's hosted models, it supports custom base URLs.

Step 1: Install the Engine

Download and install Ollama (v0.31 or higher). If you are on a Mac, this will automatically use the MLX framework for maximum speed.

Step 2: Pull the Coding Model

In your terminal, run:

ollama pull gemma4:31b

(Note: If you have less than 32GB of RAM, use gemma4:26b-a4b—the Mixture of Agents version that runs effectively on 16GB.)

Step 3: Configure the CLI

Point the Claude Code CLI (or an open-source alternative like Aider) to your local host. Set these environment variables:

export ANTHROPIC_BASE_URL=http://localhost:11434/v1
export ANTHROPIC_API_KEY=free-local-key

Launch the CLI with claude, and you are now coding with local Google brains inside an Anthropic-grade interface.

Is Local Gemma 4 Actually Good for Coding?

Historically, local models were too slow or too "dumb" for autonomous agent work. That changed with Gemma 4. In our testing, the 31B model achieved an 80.0% on LiveCodeBench v6, a score that puts it in direct competition with many paid APIs.

The 90% speedup on Apple Silicon is the real game-changer. By using speculative decoding (Multi-Token Prediction), Ollama can verify multiple tokens at once. Since code is highly predictable (boilerplate, closing brackets), the speedup feels even more significant during real-world tasks. This makes it a viable part of a multi-agent orchestration workflow where cost-efficiency is paramount.

What if I Don’t Have a Mac?

If you are on Windows or Linux without a powerful GPU, you can still access this "free" tier via OpenRouter.

OpenRouter currently offers a free version of Gemma 4 31B (google/gemma-4-31b-it:free) with a 200 requests-per-day limit. This is perfect for those who want the power of a high-precision coding agent without running a local server. Simply swap your ANTHROPIC_BASE_URL to OpenRouter’s endpoint.

What This Means for You

For small business owners and solo founders, this is the end of "token anxiety." You can now:

  • Run background agents: Let an agent refactor an entire legacy module overnight for $0.
  • Own your infrastructure: Keep your proprietary code on your own hardware, as discussed in our Local AI Box guide.
  • Scale without limits: Deploy ten agents to work on different tasks simultaneously without worrying about the bill.

The Verdict: While Claude 3.5 Sonnet remains the gold standard for complex logic, the Gemma 4 + Local CLI combo is now the "good enough" baseline for 80% of daily coding work.


Q: Do I need a paid Anthropic account?
A: No. While the Claude Code CLI is distributed by Anthropic, pointing it to a local base URL bypasses the need for their paid API credits.

Q: Which Mac hardware is best for this?
A: Any M-series chip (M1-M4) works, but you'll want at least 32GB of Unified Memory to run the high-quality 31B model smoothly.

Q: Is it safe to use?
A: Yes. Running locally is the most secure way to use AI, as your source code never leaves your machine.

Q: Does it support tool use (running tests, reading files)?
A: Yes. Gemma 4 has native support for function calling, which Claude Code uses to interact with your terminal and filesystem.


Sources:

  • Ollama Blog: Faster Gemma 4 on MLX (June 2026)
  • Google DeepMind: Gemma 4 Technical Report
  • OpenRouter: Free Model Directory

Updates & Corrections:

  • 2026-07-03: Initial release; verified Gemma 4 speed benchmarks on M3 Max.

Get the practical AI brief

Verified, no-hype AI tips you can actually use - in your inbox. Free.

No spam. We verify what we send. Unsubscribe anytime.

Discussion

0 comments
Sham

Sham

AI Engineer & Founder, The Tech Archive

AI engineer (Azure AI-102/AI-900). Writes practical, tested, hype-free guides on using AI for real work and small business at The Tech Archive.

Related Articles

View all
The One-Shot Studio: How Claude Fable 5 Replaced Software & Game Agencies
AI for Small Business

The One-Shot Studio: How Claude Fable 5 Replaced Software & Game Agencies

5 min
The AI Wealth Window: Why the Next 12 Months are the 'Cheap Infrastructure' Era for Founders
AI for Small Business

The AI Wealth Window: Why the Next 12 Months are the 'Cheap Infrastructure' Era for Founders

6 min
The Agentic OS: Why You Should Stop Prompting and Start Designing Loops
AI for Small Business

The Agentic OS: Why You Should Stop Prompting and Start Designing Loops

5 min
Beyond the Chatbot: The Rise of the Omnipresent AI Teammate
AI for Small Business

Beyond the Chatbot: The Rise of the Omnipresent AI Teammate

6 min
Bridging the $530B Gap: How Voice AI is Unlocking Formal Credit for India’s MSMEs
AI for Small Business

Bridging the $530B Gap: How Voice AI is Unlocking Formal Credit for India’s MSMEs

5 min
The AI-First Business Strategy: 40 Brutal Truths for 2026
AI for Small Business

The AI-First Business Strategy: 40 Brutal Truths for 2026

5 min