Verdict: GLM 5.2 (Z.ai, June 13 2026) won four out of five hands-on build tests against Claude Opus 4.8 (Anthropic, May 28 2026) in my comparison, but the bigger takeaway is not "China beats America." It is that the best setup is a two-model workflow: GLM 5.2 for fast, creative, flat-fee building, and Claude Opus 4.8 for careful, high-stakes finishing. Neither one is a clean replacement for the other, and the model you pick today will likely be eclipsed next month anyway.
Last verified: 2026-06-18 · GLM 5.2: 4/5 wins in build quality · Claude Opus 4.8: 1/5 wins, stronger on polished long-horizon tasks · Best move: run both with a routing layer, not a single-model lock-in
Both models are brand-new frontiers. GLM 5.2 is a 744-billion-parameter, Mixture-of-Experts (MoE) coding model from Zhipu AI, released as the international-facing Z.ai brand, with a claimed 1 million-token context and open-weights/MIT licensing promised the week after launch source: Z.ai DevPack docs. Claude Opus 4.8 is Anthropic's newest flagship, shipped at the same $5/$25 per-million-token API price as Opus 4.7, with a 1M-token context and new "dynamic workflows" in Claude Code source: Anthropic release.
The test setup: same prompt, same job, different models
I ran the same five build prompts through both models with no special tuning. The goal was to see which produced the more useful, runnable, finished artifact for ordinary builder work: games, landing pages, and data visualisations.
The five tasks were:
- Browser game: a simple runner — dodge blocks, collect coins, speed increases over time.
- Landing page: a clean, Apple-style launch page for a fictional product.
- Physics animation: liquid sloshing inside a bowl, with interactive colour controls.
- Arcade game: a neon arcade-style game with score, movement, and visual flair.
- Solar-system map: an animated, accurate-enough map of planets in motion.
All five prompts were written to be practical, not academic. I judged output on completeness, polish, interactivity, and how little rework the result needed.
The results: GLM 5.2 won the build quality battle
| # | Task | GLM 5.2 | Claude Opus 4.8 | Winner |
|---|---|---|---|---|
| 1 | Runner game | Alive feel, smooth speed ramp, fun to play | Worked, but flat and lifeless | GLM 5.2 |
| 2 | Apple-style landing page | Premium look, working menu, smooth scroll | Fewer elements, visually flat | GLM 5.2 |
| 3 | Liquid-in-bowl physics | Colour controls, smooth animation, alive | Faded quickly, dull motion | GLM 5.2 |
| 4 | Neon arcade game | Wild, fun, visually distinctive | Buggy movement, run-of-the-mill | GLM 5.2 |
| 5 | Solar-system map | Competent but plain | Cleaner, better-looking, more accurate | Claude Opus 4.8 |
Score: GLM 5.2 4 — Claude Opus 4.8 1.
These are my own hands-on observations, not independent benchmarks. Z.ai did not publish official benchmark scores at launch, and no third-party evaluation was available as of the release date source: Z.ai status update, reported by observers. That matters, because benchmark numbers are the normal way the industry compares models. Without them, the only honest evidence is direct testing like this — and even that is one person's prompts, one person's taste.
What GLM 5.2 actually is
GLM 5.2 is the latest flagship from Zhipu AI, a Beijing-based lab that uses the Z.ai brand internationally. The headline specs, per Z.ai's own documentation and partner pages, are:
- 744 billion total parameters, 40 billion active per token (MoE)
- 1 million token context window (model ID suffix
[1m]) - Up to 131,072 output tokens
- MIT-licensed open weights promised for release the week after launch
- Anthropic-compatible API endpoint, so it works inside Claude Code, Cline, OpenClaw, and similar tools
It is positioned as a coding-first model for long-horizon agentic tasks: write, run, revise, and deploy code across whole repositories source: Z.ai docs. The pricing is subscription-based through the GLM Coding Plan rather than per-token: Lite starts at about $12.60/month (yearly), Pro at $50.40/month, and Max at $112/month source: Z.ai subscribe page.
That flat-fee model is the real commercial hook. Claude Opus 4.8, by contrast, charges $5 per million input tokens and $25 per million output tokens at standard speed, with a fast mode at $10/$50 per million tokens source: Anthropic. If you run a model all day inside an IDE, those tokens add up fast. If you build a lot, a flat monthly plan can be dramatically cheaper — but only if the model is good enough for the work.
What Claude Opus 4.8 still does better
Claude Opus 4.8 did win the solar-system map task cleanly: the output was cleaner, better-looking, and more polished. It is also the model Anthropic positions as its most capable general-access release, with reported gains on SWE-bench Verified, Terminal-Bench 2.1, GPQA Diamond, and agentic-computer-use benchmarks source: Anthropic. The new release adds:
- Dynamic workflows in Claude Code: parallel subagents that plan, execute, and verify pieces of a task
- Effort controls: low / high / xhigh / max thinking levels
- Fast mode that runs ~2.5× faster and is 3× cheaper than the previous generation's fast mode
- The same API price as Opus 4.7
The broader pattern is consistent with what I saw: Claude Opus 4.8 is still excellent for tasks that reward structure, accuracy, and careful finishing. The real gap is not intelligence; it is style, speed-to-polish, and cost structure.
The catch nobody is talking about: no benchmarks yet
Here is the honest caveat. Z.ai launched GLM 5.2 on June 13, 2026 without publishing official benchmark scores. No independent third-party evaluation was available at launch. The company says it is strong on long coding tasks, but that is a vendor claim, not a verified result source: Z.ai status update via X. The open weights, standalone API, and chatbot were also scheduled for release the week after launch, not on day one source: Z.ai status update via X.
What does that mean in practice? Any headline saying GLM 5.2 "destroys" Claude on the numbers is speculation. My five tests are real, but they are my tests — small-sample, subjective, and focused on frontend games and pages. They are not a substitute for a SWE-bench run or a controlled coding evaluation.
If you are deciding whether to switch, treat GLM 5.2 as a promising, cost-efficient workhorse for the kinds of tasks it won in my test, not as a scientifically proven Claude-killer.
How much does each model cost in real use?
| Cost dimension | GLM 5.2 (GLM Coding Plan) | Claude Opus 4.8 (API) |
|---|---|---|
| Pricing model | Flat monthly subscription | Pay-per-token |
| Entry plan | ~$12.60/mo (Lite, yearly) | $0, plus token usage |
| Mid plan | ~$50.40/mo (Pro, yearly) | Varies with usage |
| Heavy plan | ~$112/mo (Max, yearly) | Can exceed $100s at scale |
| Standard API rate | Included in plan | $5 input / $25 output per 1M tokens |
| Fast/high-effort rate | Included in plan | $10 input / $50 output per 1M tokens |
| Context window | 1M tokens | 1M tokens |
The GLM Coding Plan wins on predictability. Claude Opus 4.8 wins if your volume is low or if you need exact per-token cost accounting. For a small business or solo builder running AI inside an IDE for hours a day, the flat plan is usually cheaper. For an agency that bills per project and has lumpy usage, per-token can be safer.
What this means for you: don't pick a side, build a routing layer
The old rule was "pay more, get the best." That rule is broken. In 2026, the best result comes from giving each job to the model that does it best and cheapest.
Here is the simplest version of a two-model workflow I use:
- Fast, creative, bulk building → GLM 5.2. Landing pages, games, content drafts, component scaffolding, repetitive coding tasks. Its flat-fee plan removes the meter-anxiety that comes with running Claude all day.
- Careful finishing, review, architecture, accuracy → Claude Opus 4.8. Code review, architectural decisions, multi-file refactors, final polish, and anything where mistakes are expensive.
- Shared memory → Obsidian or a vector store. Both models need the same brand voice, project context, and constraints so they do not start from scratch every time.
- One orchestration screen → Hermes Agent or another agent OS. Route prompts to the right model automatically, log results, and avoid tab-hopping.
That setup is more powerful than either model alone. It is also future-proof: when GLM 5.3 or Claude Opus 4.9 drops, you swap the model ID, not your whole workflow.
Related reading
FAQ
Q: Is GLM 5.2 really better than Claude Opus 4.8? A: It won 4 of 5 build tests in my comparison, but "better" depends on the task. There are no independent benchmarks yet, so treat GLM 5.2 as a strong, cost-efficient coding option rather than a proven universal winner.
Q: Can I download GLM 5.2 weights and run them locally? A: Not yet on launch day. Z.ai said open weights, a standalone API, and the chatbot would arrive the week after release. Check the Z.ai docs page for current availability.
Q: How do I use GLM 5.2 inside Claude Code?
A: Set the model IDs in ~/.claude/settings.json to glm-5.2 or glm-5.2[1m] for the 1M context variant, and use /effort to switch between high and max thinking modes source: Z.ai docs.
Q: Is Claude Opus 4.8 now overpriced? A: Not necessarily. It is still a top-tier model for careful, high-stakes work, and its API price is unchanged from Opus 4.7. The question is whether your workload uses enough tokens that a flat-fee alternative becomes cheaper.
Q: What is the safest way to test GLM 5.2? A: Start with low-stakes build tasks on the Lite plan, run the same prompts through Claude in parallel, and compare outputs for your specific codebases. Do not migrate critical production workflows before official benchmarks and open-weight availability are confirmed.
Discussion
0 comments