Verdict: On deep-research tasks, OpenRouter Fusion can beat Claude Fable 5 working alone. In OpenRouter's own DRACO benchmark, a Fable 5 + GPT-5.5 panel fused by Claude Opus 4.8 scored 69.0%, while Fable 5 solo scored 65.3% (on 93 of 100 tasks). A budget panel of Gemini 3 Flash, Kimi K2.6, and DeepSeek V4 Pro reached 64.7%, within one point of Fable 5 at a lower price. But Fusion is not a universal replacement: it costs more per call, adds latency, and its benchmark edge is on research-style synthesis, not coding or simple Q&A.
Last verified: 2026-06-17 · Best for: deep research, expert critique, high-stakes analysis · Avoid for: quick chat, low-latency, or routine coding · Pricing: cumulative — you pay for every panel call plus the judge
What is OpenRouter Fusion?
OpenRouter Fusion is a server-side "model panel" feature. Instead of sending your prompt to one model and getting one answer back, Fusion sends it to several models at the same time. A separate judge model reads all the responses, compares them, and writes one final answer based on where the models agree, disagree, and what each one uniquely caught.
Fusion is available through OpenRouter in four ways:
- Model alias: send
"model": "openrouter/fusion"in any OpenAI-compatible client. - Server tool: add
{ "type": "openrouter:fusion" }to your tools array and let your outer model decide when to invoke it. - Plugin: configure a custom panel through the Fusion plugin.
- Chatroom: test panels interactively at
openrouter.ai/fusion.
Each panel model runs with openrouter:web_search, openrouter:web_fetch, and openrouter:bash enabled, so the panel can pull live sources while it answers. The judge then returns structured analysis covering consensus, contradictions, partial coverage, unique insights, and blind spots, and the outer model writes the final response from that analysis.
This is not a new academic idea — multi-model aggregation (Mixture-of-Agents) has appeared in research for over a year — but OpenRouter productizes it as a single API call with unified billing and no orchestration code to maintain.
How the benchmark was run
OpenRouter tested Fusion on the DRACO benchmark, a 100-task deep-research suite created by Perplexity AI and published on arXiv. DRACO tasks span ten domains (law, medicine, finance, tech, UX design, product comparison, and others) and are graded across roughly 39 weighted criteria in four categories: factual accuracy, breadth and depth, presentation quality, and citation quality. Wrong answers can carry negative weight, so verbose but inaccurate answers do not inflate scores.
OpenRouter used Gemini 3.1 Pro Preview as the judge rather than the Gemini 3 Pro judge from the original DRACO paper, so the absolute scores are not directly comparable to the paper's published numbers. The test was designed to show relative differences between Fusion panels and solo models.
All configurations — solo and panel — had access to the same three tools: openrouter:web_search (via Exa), openrouter:web_fetch (via Exa), and openrouter:bash. OpenRouter also blocked sites that hosted the benchmark answer key to prevent the models from "cheating" by looking up answers.
The numbers: Fusion panels vs solo models
| Configuration | Models | DRACO score | Notes |
|---|---|---|---|
| Fusion (frontier panel) | Fable 5 + GPT-5.5, judged by Opus 4.8 | 69.0% | Highest score; Fable 5 was still available at test time |
| Fusion (frontier panel) | Opus 4.8 + GPT-5.5 + Gemini 3.1 Pro, judged by Opus 4.8 | 68.3% | No Fable 5 required |
| Fusion (frontier panel) | Opus 4.8 + GPT-5.5, judged by Opus 4.8 | 67.6% | Two-model frontier panel |
| Fusion (same-model panel) | Opus 4.8 + Opus 4.8, judged by Opus 4.8 | 65.5% | 6.7 points above solo Opus 4.8 |
| Solo | Claude Fable 5 | 65.3% | 7 of 100 tasks blocked by content filters |
| Fusion (budget panel) | Gemini 3 Flash + Kimi K2.6 + DeepSeek V4 Pro, judged by Opus 4.8 | 64.7% | Within 0.6 points of Fable 5 |
| Solo | DeepSeek V4 Pro | 60.3% | — |
| Solo | GPT-5.5 | 60.0% | — |
| Solo | Claude Opus 4.8 | 58.8% | — |
| Solo | Kimi K2.6 | 53.7% | — |
| Solo | Gemini 3.1 Pro | 45.4% | — |
| Solo | Gemini 3 Flash | 43.1% | — |
Source: OpenRouter "Surpassing Frontier Performance with Fusion" blog post, published June 12, 2026 and updated June 14, 2026.
Two results stand out. First, the frontier panel beat the best solo model. Second, the budget panel nearly matched Fable 5 while using three substantially cheaper models. The most surprising result, though, is that running Opus 4.8 twice and judging it with itself still produced a large gain over a single Opus 4.8 call. That suggests a meaningful part of Fusion's lift comes from the synthesis step itself, not just from mixing different model architectures.
What does Fusion cost in practice?
Fusion pricing is cumulative. You pay for every model in the panel, plus the judge call, plus your outer completion. With a three-model panel, expect roughly 4–5× the cost of a single completion on the same prompt. The exact multiplier depends on which models you pick.
List prices on OpenRouter (per 1 million tokens) at the time of writing:
| Model | Input | Output |
|---|---|---|
| Claude Fable 5 | $10.00 | $50.00 |
| GPT-5.5 | $5.00 | $30.00 |
| Claude Opus 4.8 | $5.00 | $25.00 |
| Gemini 3 Flash Preview | $0.50 | $3.00 |
| Kimi K2.6 | $0.68 | $3.41 |
| DeepSeek V4 Pro | $0.435 | $0.87 |
Source: OpenRouter model pages for each listed model, accessed June 17, 2026.
The "half the cost of Fable 5" claim applies to the budget panel reaching near-Fable quality, not to the frontier panel. The frontier Fable 5 + GPT-5.5 panel is more expensive than using Fable 5 alone because you are paying for two frontier completions plus the judge. Fusion's economic argument is cost-per-quality-unit, not cost-per-request.
Important caveats you should know
Fable 5 is currently suspended. Anthropic suspended access to Claude Fable 5 and Mythos 5 on June 12, 2026, following a US government export-control directive. This means the exact Fable 5 + GPT-5.5 panel that topped the benchmark cannot be reproduced today. The finding still matters because it shows what a well-chosen panel can achieve, and the no-Fable panels also scored above every solo model except Fable 5.
One benchmark is not every workload. DRACO measures research-style synthesis across text-only, English-only tasks. OpenRouter explicitly says Fusion is not a drop-in replacement for coding models, and that DRACO does not include the long-horizon tasks where Fable 5 shines. For routine coding, quick chat, or simple extraction, one good model is usually faster and cheaper.
Latency is higher. Fusion waits for every panel model to finish, then runs the judge, then returns the final answer. That is inherently slower than a single model call.
The Fable 5 solo score is on 93 tasks, not 100. Fable 5's content filters blocked 7 of the 100 DRACO tasks, so its 65.3% reflects 93 scored tasks. OpenRouter notes that this makes direct comparisons against models that completed all 100 tasks "slightly uneven."
How to use Fusion in your own stack
The simplest integration is the model alias:
{
"model": "openrouter/fusion",
"messages": [
{ "role": "user", "content": "Compare the strongest arguments for and against a carbon tax, with citations." }
]
}
If you want more control, use the server tool form and pick your own outer model:
{
"model": "~anthropic/claude-opus-latest",
"messages": [
{ "role": "user", "content": "..." }
],
"tools": [
{
"type": "openrouter:fusion",
"parameters": {
"analysis_models": [
"~google/gemini-flash-latest",
"deepseek/deepseek-v3.2",
"~moonshotai/kimi-latest"
],
"model": "~anthropic/claude-opus-latest"
}
}
]
}
Panel sizes can range from 1 to 8 models. To force Fusion on every request, set tool_choice: "required". Otherwise, the outer model decides whether the prompt benefits from multiple perspectives. OpenRouter recommends Fusion for research questions, expert critique, compare-and-contrast prompts, and anything where the cost of being wrong outweighs the cost of extra completions.
What this means for you
If you are building AI workflows for a small business or a product team, Fusion is best treated as a specialist tool, not a default model. Three practical rules:
- Use Fusion when the answer really matters. Deep research, competitive analysis, due diligence, and expert critique are the sweet spots. The budget panel gives you near-frontier research quality without a frontier bill.
- Do not replace your everyday model with it. For customer-support replies, routine content drafting, or standard coding tasks, a single fast model is cheaper and lower-latency.
- Build with model-agnostic fallbacks. The Fable 5 suspension is a reminder that access to any single frontier model can change overnight. Designing prompts and pipelines so the model slug is a configuration value, not hard-coded logic, makes your stack more resilient. We cover that pattern in our guide to building a Claude Agent OS for small business.
For cost-conscious builders, Fusion's budget panel is the most interesting result: it suggests you can get Fable-level research output for a fraction of the per-token price, provided you accept the latency and the cumulative billing model.
FAQ
Q: What is OpenRouter Fusion?
A: Fusion is an OpenRouter feature that sends a prompt to a panel of models in parallel, then uses a judge model to synthesize their responses into one final answer. It is accessible as the openrouter/fusion model alias, a server tool, a plugin, or an interactive chatroom.
Q: Did OpenRouter Fusion officially beat Fable 5? A: On OpenRouter's DRACO benchmark, a Fable 5 + GPT-5.5 panel scored 69.0%, which is higher than Fable 5's solo score of 65.3%. However, Fable 5 completed only 93 of the 100 tasks due to content-filter blocks, and DRACO is a research-only benchmark, so the result does not generalize to every workload.
Q: Why would running the same model twice help? A: Because of sampling variability, the same model can take different reasoning paths and pull different sources across two runs. The judge then extracts the best parts of both. OpenRouter found that Opus 4.8 paired with itself scored 65.5%, a 6.7-point gain over solo Opus 4.8 at 58.8%.
Q: Is Fusion cheaper than using a single frontier model? A: It depends on the panel. The budget panel (Gemini 3 Flash + Kimi K2.6 + DeepSeek V4 Pro) reached near-Fable performance at roughly half the cost of a premium solo model. A frontier panel costs more per request because you pay for every panel call plus the judge.
Q: Can I use Fusion for coding? A: OpenRouter says Fusion is not a drop-in replacement for coding models. It works better as a research or architecture-decision tool that a coding model can invoke selectively, rather than as the main code-generation layer.
Q: Is Fable 5 still available? A: No. Anthropic suspended access to Fable 5 and Mythos 5 on June 12, 2026. OpenRouter's blog and Anthropic's release notes both confirm the suspension. No restoration date has been announced.
Discussion
0 comments