The Tech ArchiveThe Tech ArchiveThe Tech Archive
ArticlesTopicsSeriesAbout

Get the practical AI brief

Verified, no-hype AI tips you can actually use - in your inbox. Free.

No spam. We verify what we send. Unsubscribe anytime.

The Tech ArchiveThe Tech Archive

The Tech Archive

AI news, analysis & explainers

AboutArticlesTopicsSeriesPages

© 2026 All rights reserved.

Back to home
0 readers reading
  1. Home
  2. Articles
  3. Artificial Intelligence
  4. How to Use North Mini Code: The Free Agentic Coding Model for Hermes Agent

Contents

How to Use North Mini Code: The Free Agentic Coding Model for Hermes Agent
Artificial Intelligence

How to Use North Mini Code: The Free Agentic Coding Model for Hermes Agent

Cohere’s North Mini Code is a free, 30B MoE model built for agentic coding. Learn how to run it in Hermes Agent for 24/7 autonomous software work.

Sham

Sham

AI Engineer & Founder, The Tech Archive

5 min read
0 views
June 19, 2026

Verdict: North Mini Code is the first specialized "agentic" coding model that balances a small active footprint (3B) with high reasoning performance (30B MoE). For developers and small businesses using Hermes Agent, it provides a cost-free, high-speed alternative to frontier models for terminal tasks and software engineering.

Last verified: 2026-06-20
Best for: Agentic coding, terminal automation, and low-cost 24/7 research bots.
Key Specs: 30B Sparse MoE (3B active), 256K Context, Apache 2.0 License.

What is North Mini Code?

Released on June 9, 2026, North Mini Code is Cohere Labs’ first open-weight model purpose-built for the developer community. It belongs to the new "North" family of models, designed to move beyond simple chat completion and into the realm of agentic software engineering.

Unlike generalist models, North Mini Code is a sparse Mixture-of-Experts (MoE) architecture. It has 30 billion total parameters, but only activates 3 billion parameters per token. This design allows for the reasoning depth of a mid-sized model with the speed and low latency of a small model, making it ideal for the multi-turn "think-act-verify" loops required by AI agents.

How to Get the North Mini Code API for Free

The most accessible way to use North Mini Code today is through the OpenRouter Free API. OpenRouter provides a hosted version of the model that costs $0.00 per million tokens (as of June 2026), making it a perfect "free tier" for autonomous workflows.

  • OpenRouter: Search for cohere/north-mini-code:free.
  • Cohere Model Vault: Managed inference with production-grade rate limits.
  • Local Deployment: Available via Ollama or Hugging Face (requires ~1x H100 GPU for FP8/FP4 precision).

Setting Up North Mini Code in Hermes Agent

To wire North Mini Code into Hermes Agent, you should create a dedicated agent profile. This allows you to route specific "grind" tasks to the free model while keeping your frontier models (like Claude 3.7) for complex architectural decisions.

1. Configure the Provider

Set up OpenRouter as a provider in your Hermes configuration:

hermes model set OpenRouter

2. Create the North Mini Profile

Run this command to spin up a specialized profile:

hermes profile create north-mini --model "cohere/north-mini-code:free"

Once created, you can delegate tasks specifically to this profile. Because the API is free, you can let North Mini run 24/7 on Kanban tasks without worrying about your token budget.

Why North Mini Code Wins for Agentic Tasks

Most "Mini" models struggle with complex tool use or multi-step reasoning. North Mini Code solves this by being trained against multiple agent harnesses rather than just static datasets. It was optimized using Reinforcement Learning with Verifiable Rewards (RLVR) against frameworks like SWE-Agent and OpenCode.

Metric North Mini Code (30B-A3B) Gemma 4 (26B-A4B) Qwen 3.5 (35B-A3B)
Artificial Analysis Coding Index 33.4 31.2 30.1
SWE-bench Verified 61.0 58.5 56.2
HumanEval (Pass@1) 78.4% 76.1% 75.8%
Context Window 256K 128K 128K

Source: Cohere Labs Internal Benchmarks (June 2026).

What this means for you

In 2026, the winning strategy for AI-driven business is Model Routing. Don't waste your expensive frontier model tokens on repetitive terminal tasks, file searching, or basic refactoring.

Route the "manual labor" of software engineering to North Mini Code. It can run in the background 24/7, managing your local AI assistant infrastructure, while you save your premium models for the high-level strategy and complex debugging that actually requires a "Frontier" brain.

Related reading

  • North Mini Code vs Gemma 4 showdown

FAQ

Q: Is North Mini Code truly free?
A: Yes, the model weights are Apache 2.0, meaning you can run it locally for free forever. It is also currently offered as a free API on OpenRouter and through Cohere's trial keys.

Q: Does it support tool use and JSON?
A: Yes. North Mini Code is natively optimized for interleaved reasoning and tool use via JSON schema, allowing it to "think" before it calls a tool.

Q: Can it handle large codebases?
A: With a 256K context window, it can ingest significant portions of a repository at once, which is a major advantage over other small coding models.

Q: How do I run it locally?
A: Use ollama run north-mini-code. You will need at least 24GB of VRAM (like an RTX 4090 or Mac Studio) for a quantized version, or a dedicated H100 for full FP8 performance.

Sources
  • Cohere Blog: North Mini Code Announcement (June 9, 2026)
  • Hugging Face: North-Mini-Code-1.0 Model Card
  • OpenRouter: Model Documentation & Pricing
Updates & Corrections
  • 2026-06-20: Article published. Fact-checked architecture (30B MoE) and benchmarks vs Gemma 4.

Get the practical AI brief

Verified, no-hype AI tips you can actually use - in your inbox. Free.

No spam. We verify what we send. Unsubscribe anytime.

Discussion

0 comments
Sham

Sham

AI Engineer & Founder, The Tech Archive

AI engineer (Azure AI-102/AI-900). Writes practical, tested, hype-free guides on using AI for real work and small business at The Tech Archive.

Related Articles