Verdict: North Mini Code is the first specialized "agentic" coding model that balances a small active footprint (3B) with high reasoning performance (30B MoE). For developers and small businesses using Hermes Agent, it provides a cost-free, high-speed alternative to frontier models for terminal tasks and software engineering.
Last verified: 2026-06-20
Best for: Agentic coding, terminal automation, and low-cost 24/7 research bots.
Key Specs: 30B Sparse MoE (3B active), 256K Context, Apache 2.0 License.
What is North Mini Code?
Released on June 9, 2026, North Mini Code is Cohere Labs’ first open-weight model purpose-built for the developer community. It belongs to the new "North" family of models, designed to move beyond simple chat completion and into the realm of agentic software engineering.
Unlike generalist models, North Mini Code is a sparse Mixture-of-Experts (MoE) architecture. It has 30 billion total parameters, but only activates 3 billion parameters per token. This design allows for the reasoning depth of a mid-sized model with the speed and low latency of a small model, making it ideal for the multi-turn "think-act-verify" loops required by AI agents.
How to Get the North Mini Code API for Free
The most accessible way to use North Mini Code today is through the OpenRouter Free API. OpenRouter provides a hosted version of the model that costs $0.00 per million tokens (as of June 2026), making it a perfect "free tier" for autonomous workflows.
- OpenRouter: Search for
cohere/north-mini-code:free. - Cohere Model Vault: Managed inference with production-grade rate limits.
- Local Deployment: Available via Ollama or Hugging Face (requires ~1x H100 GPU for FP8/FP4 precision).
Setting Up North Mini Code in Hermes Agent
To wire North Mini Code into Hermes Agent, you should create a dedicated agent profile. This allows you to route specific "grind" tasks to the free model while keeping your frontier models (like Claude 3.7) for complex architectural decisions.
1. Configure the Provider
Set up OpenRouter as a provider in your Hermes configuration:
hermes model set OpenRouter
2. Create the North Mini Profile
Run this command to spin up a specialized profile:
hermes profile create north-mini --model "cohere/north-mini-code:free"
Once created, you can delegate tasks specifically to this profile. Because the API is free, you can let North Mini run 24/7 on Kanban tasks without worrying about your token budget.
Why North Mini Code Wins for Agentic Tasks
Most "Mini" models struggle with complex tool use or multi-step reasoning. North Mini Code solves this by being trained against multiple agent harnesses rather than just static datasets. It was optimized using Reinforcement Learning with Verifiable Rewards (RLVR) against frameworks like SWE-Agent and OpenCode.
| Metric | North Mini Code (30B-A3B) | Gemma 4 (26B-A4B) | Qwen 3.5 (35B-A3B) |
|---|---|---|---|
| Artificial Analysis Coding Index | 33.4 | 31.2 | 30.1 |
| SWE-bench Verified | 61.0 | 58.5 | 56.2 |
| HumanEval (Pass@1) | 78.4% | 76.1% | 75.8% |
| Context Window | 256K | 128K | 128K |
Source: Cohere Labs Internal Benchmarks (June 2026).
What this means for you
In 2026, the winning strategy for AI-driven business is Model Routing. Don't waste your expensive frontier model tokens on repetitive terminal tasks, file searching, or basic refactoring.
Route the "manual labor" of software engineering to North Mini Code. It can run in the background 24/7, managing your local AI assistant infrastructure, while you save your premium models for the high-level strategy and complex debugging that actually requires a "Frontier" brain.
Related reading
FAQ
Q: Is North Mini Code truly free?
A: Yes, the model weights are Apache 2.0, meaning you can run it locally for free forever. It is also currently offered as a free API on OpenRouter and through Cohere's trial keys.
Q: Does it support tool use and JSON?
A: Yes. North Mini Code is natively optimized for interleaved reasoning and tool use via JSON schema, allowing it to "think" before it calls a tool.
Q: Can it handle large codebases?
A: With a 256K context window, it can ingest significant portions of a repository at once, which is a major advantage over other small coding models.
Q: How do I run it locally?
A: Use ollama run north-mini-code. You will need at least 24GB of VRAM (like an RTX 4090 or Mac Studio) for a quantized version, or a dedicated H100 for full FP8 performance.
Discussion
0 comments