Loop Engineering: Why the Best AI Agents in 2026 Are Built as Loops, Not Prompts

Q: Can I use Claude Code in a loop?

Yes. Claude Code has a non-interactive claude -p mode and a /goal command that runs until a completion condition is met (Claude Code cheat sheet, ExplainX on /goal).

Verdict: The most productive AI users are no longer writing longer prompts — they are writing loops that run the prompts themselves. Loop engineering is the practice of replacing yourself as the prompter with a system that discovers work, executes it, checks it, and learns from it. It is now practical because frontier models, built-in cron/scheduling, isolated workspaces, and verifier agents are all arriving at the same time.

Last verified: 2026-06-16 · Deterministic loops = test/compile gates · Non-deterministic loops = reviewer/quality gates · Tool of the moment: Claude Code /goal, OpenAI Codex Automations, Hermes cron + subagents

1. What loop engineering actually means

For most of 2024 and 2025, getting value from a coding agent meant typing a prompt, reading the response, and typing the next prompt. The skill was prompt engineering: crafting instructions so the model stayed on track.

Loop engineering flips that. Your job is not to write the prompt that completes the task. Your job is to design the system that decides what prompt to write next, runs it, reads the result, and decides whether to keep going.

The loop has five stages:

State check — read the current situation (tests, logs, repo, inbox).
Decision — decide the next action.
Execution — write code, call a tool, run a command.
Feedback — capture the result (test output, screenshot, diff, error).
Verification — decide if the goal is met. If not, loop back to step 2.

With prompt engineering, you only ever controlled step 2. With loop engineering, you design all five to run without you.

2. Why this is happening now (and not six months ago)

Three things changed in early 2026 that make loops reliable enough to ship:

Shift	Why it matters
Frontier models got better at long tasks	Anthropic says its June 2026 model Claude Fable 5 "stays focused across millions of tokens" and performs best on long, complex tasks; its lead grows as tasks get harder (Anthropic Fable 5 announcement).
First-party loop primitives shipped	Claude Code added a `/goal` command in May 2026 that runs until a completion condition is met, tracked by time, turns, and tokens (The New Stack). OpenAI Codex added Automations with a triage inbox. Hermes has built-in cron and subagents.
Compute limits relaxed	Anthropic doubled Claude Code 5-hour rate limits on May 6, 2026, removed peak-hour throttling, and signed a >300 MW SpaceX Colossus 1 GPU deal, making long runs cheaper per minute than they were at 2025 rate limits (Anthropic higher limits / SpaceX).

The result: an agent can now hold a task across dozens of turns without the conversation drifting or the user hitting a ceiling.

3. The two loop types (and when to use each)

Not every task can be checked the same way. Split loops into two buckets:

Deterministic loops: "I know what done looks like"

Use these when the success condition is objective:

Tests pass.
Code compiles.
A specific error count hits zero.
A deployment health check returns 200.

These are the safest loops to leave unattended because the verdict is mechanical. A test harness is the classic deterministic loop.

Non-deterministic loops: "A person has to judge this"

Use these when success is subjective:

Does the UI look generic or branded?
Is the tone right for the audience?
Does the refactor preserve the original intent?

These loops need a verifier agent — a separate model with different instructions that grades the output. The writer and the checker should not be the same model; a model grading its own work is consistently too lenient.

Type	Verdict	Example	Best platform
Deterministic	Pass/fail gate	Fix failing CI until green	Claude Code `-p` + tests, Hermes cron + GitHub PR skill
Non-deterministic	Review/rubric gate	Redesign a landing page until it matches brand guidelines	Hermes subagent + separate reviewer model, Claude Code subagents

4. The six building blocks of a production loop

A loop that runs overnight without breaking your repo or your budget needs more than a while loop. It needs six primitives:

Automations / cron — scheduled discovery of work (new CI failures, new PRs, new emails, new market data).
Isolated workspaces — git worktree, Docker containers, or per-thread worktrees so parallel agents do not clobber each other.
Skills / AGENTS.md — codified project knowledge so the agent does not re-learn your conventions every run.
Connectors / MCP — tools the agent can call: GitHub, Slack, browser, database, cloud APIs.
Sub-agents / verifier agents — one agent makes, another checks.
Memory — state that survives between runs: CLAUDE.md, Hermes MEMORY.md, or persistent files.

The New Stack’s June 2026 breakdown of loop engineering maps these same primitives across Claude Code and OpenAI Codex; both tools now support scheduled execution, worktrees, skills, MCP connectors, subagents, and memory (The New Stack).

5. A practical loop you can build today

Here is a deterministic example that runs on Hermes Agent, an open-source autonomous agent from Nous Research:

Trigger: cron every 15 minutes.
State check: read production health checks / test results for your deployed app.
Decision: if all checks pass, log and stop. If something fails, open a fix loop.
Execution: launch a coding agent in a fresh git worktree with the failing tests as the goal.
Feedback: run the test suite after each change.
Verification: stop only when the health checks or tests pass.
Memory: write a short note about the failure pattern and fix so the next occurrence is faster.

Hermes is built for this shape: it is a long-running daemon with built-in cron, multi-provider model support, persistent memory, and autonomous skill creation (Hermes Agent docs).

For a subjective task, swap the test gate for a review gate:

Writer agent: generates the UI or content.
Verifier agent: checks it against a rubric (brand voice, no AI-slop patterns, accessibility rules).
Loop: writer fixes → verifier re-checks → repeat until the rubric passes or a max-iteration budget is hit.

6. The hard parts no one talks about

Loop engineering is not free. The costs move from your time to three new risks:

Token cost. Unattended loops spend money unattended. A deterministic loop with a tight test harness is cheap because it stops fast. A non-deterministic loop with a loose rubric can run for hours.
Correctness. A loop will satisfy the verification gate you gave it, not the goal you meant. If your tests are weak, the loop will pass weak tests.
Comprehension debt. If the agent ships code while you sleep, you wake up owning code you have not read. Two engineers can run the same loop and get opposite outcomes depending on whether they use it to accelerate understanding or avoid it.

The fix is not to avoid loops. It is to make the harness tight before you make the loop long.

7. What this means for you

If you build with AI today, your leverage is moving up one layer of abstraction:

This week: identify one recurring task with a clear pass/fail gate (a failing test, a stale report, a daily data pull).
This month: turn it into a deterministic loop using the scheduler and tools you already have. Hermes cron, Claude Code -p, or a simple GitHub Action all work.
This quarter: add a verifier agent for one subjective task — UI review, copy review, or architecture review — so the maker and checker are separated.

Start small. One scheduled triage automation plus one verifier agent captures most of the value for a fraction of the token cost of a fully autonomous agent.

OpenAI Faces a 42-State Subpoena as Safety Scrutiny Collides

FAQ

What is loop engineering in simple terms? It is the practice of building systems that prompt and supervise AI agents automatically, instead of typing every prompt yourself.

Is loop engineering the same as reinforcement learning? It borrows the idea of feedback-driven iteration, but the model weights are not trained. The agent uses rewards (test passes, rubric scores) to choose its next action in real time.

Can I use Claude Code in a loop? Yes. Claude Code has a non-interactive claude -p mode and a /goal command that runs until a completion condition is met (Claude Code cheat sheet, ExplainX on /goal).

What is the cheapest way to start? Run a deterministic loop first. A cron job that checks a health endpoint and opens a worktree fix only on failure costs almost nothing when nothing is broken.

What is the biggest mistake teams make? Letting the loop run before the verification gate is trustworthy. A bad loop ships bad code faster.

Sources

Anthropic. "Claude Fable 5 and Claude Mythos 5." June 9, 2026. https://www.anthropic.com/news/claude-fable-5-mythos-5
Anthropic. "Higher usage limits for Claude and a compute deal with SpaceX." May 6, 2026. https://www.anthropic.com/news/higher-limits-spacex
Anthropic. Claude Code product page. https://www.anthropic.com/product/claude-code
ExplainX. "Claude Code 2.1.139 adds /goal command." May 12, 2026. https://explainx.ai/blog/claude-code-goal-command-long-running-agents-2026
Hermes Agent documentation. https://hermes-agent.nousresearch.com/docs/
Hermes Atlas. "Hermes Agent vs Claude Code." Updated April 2026. https://hermesatlas.com/guide/vs-claude-code
Janakiram MSV. "The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops." The New Stack, June 10, 2026. https://thenewstack.io/loop-engineering/
OpenClaw. "Agent loop." https://docs.openclaw.ai/concepts/agent-loop

Updates & Corrections

2026-06-16 — Article published. Pricing and availability facts are volatile; Anthropic suspended Claude Fable 5 / Mythos 5 access on June 12, 2026, following a US government export-control directive.

Last verified: 2026-06-16 · Deterministic loops = test/compile gates · Non-deterministic loops = reviewer/quality gates · Tool of the moment: Claude Code /goal, OpenAI Codex Automations, Hermes cron + subagents

1. What loop engineering actually means

The loop has five stages:

State check — read the current situation (tests, logs, repo, inbox).
Decision — decide the next action.
Execution — write code, call a tool, run a command.
Feedback — capture the result (test output, screenshot, diff, error).
Verification — decide if the goal is met. If not, loop back to step 2.

With prompt engineering, you only ever controlled step 2. With loop engineering, you design all five to run without you.

2. Why this is happening now (and not six months ago)

Three things changed in early 2026 that make loops reliable enough to ship:

Shift	Why it matters
Frontier models got better at long tasks	Anthropic says its June 2026 model Claude Fable 5 "stays focused across millions of tokens" and performs best on long, complex tasks; its lead grows as tasks get harder (Anthropic Fable 5 announcement).
First-party loop primitives shipped	Claude Code added a `/goal` command in May 2026 that runs until a completion condition is met, tracked by time, turns, and tokens (The New Stack). OpenAI Codex added Automations with a triage inbox. Hermes has built-in cron and subagents.
Compute limits relaxed	Anthropic doubled Claude Code 5-hour rate limits on May 6, 2026, removed peak-hour throttling, and signed a >300 MW SpaceX Colossus 1 GPU deal, making long runs cheaper per minute than they were at 2025 rate limits (Anthropic higher limits / SpaceX).

The result: an agent can now hold a task across dozens of turns without the conversation drifting or the user hitting a ceiling.

3. The two loop types (and when to use each)

Not every task can be checked the same way. Split loops into two buckets:

Deterministic loops: "I know what done looks like"

Use these when the success condition is objective:

Tests pass.
Code compiles.
A specific error count hits zero.
A deployment health check returns 200.

These are the safest loops to leave unattended because the verdict is mechanical. A test harness is the classic deterministic loop.

Non-deterministic loops: "A person has to judge this"

Use these when success is subjective:

Does the UI look generic or branded?
Is the tone right for the audience?
Does the refactor preserve the original intent?

Type	Verdict	Example	Best platform
Deterministic	Pass/fail gate	Fix failing CI until green	Claude Code `-p` + tests, Hermes cron + GitHub PR skill
Non-deterministic	Review/rubric gate	Redesign a landing page until it matches brand guidelines	Hermes subagent + separate reviewer model, Claude Code subagents

4. The six building blocks of a production loop

A loop that runs overnight without breaking your repo or your budget needs more than a while loop. It needs six primitives:

Automations / cron — scheduled discovery of work (new CI failures, new PRs, new emails, new market data).
Isolated workspaces — git worktree, Docker containers, or per-thread worktrees so parallel agents do not clobber each other.
Skills / AGENTS.md — codified project knowledge so the agent does not re-learn your conventions every run.
Connectors / MCP — tools the agent can call: GitHub, Slack, browser, database, cloud APIs.
Sub-agents / verifier agents — one agent makes, another checks.
Memory — state that survives between runs: CLAUDE.md, Hermes MEMORY.md, or persistent files.

5. A practical loop you can build today

Here is a deterministic example that runs on Hermes Agent, an open-source autonomous agent from Nous Research:

Trigger: cron every 15 minutes.
State check: read production health checks / test results for your deployed app.
Decision: if all checks pass, log and stop. If something fails, open a fix loop.
Execution: launch a coding agent in a fresh git worktree with the failing tests as the goal.
Feedback: run the test suite after each change.
Verification: stop only when the health checks or tests pass.
Memory: write a short note about the failure pattern and fix so the next occurrence is faster.

Hermes is built for this shape: it is a long-running daemon with built-in cron, multi-provider model support, persistent memory, and autonomous skill creation (Hermes Agent docs).

For a subjective task, swap the test gate for a review gate:

Writer agent: generates the UI or content.
Verifier agent: checks it against a rubric (brand voice, no AI-slop patterns, accessibility rules).
Loop: writer fixes → verifier re-checks → repeat until the rubric passes or a max-iteration budget is hit.

6. The hard parts no one talks about

Loop engineering is not free. The costs move from your time to three new risks:

Token cost. Unattended loops spend money unattended. A deterministic loop with a tight test harness is cheap because it stops fast. A non-deterministic loop with a loose rubric can run for hours.
Correctness. A loop will satisfy the verification gate you gave it, not the goal you meant. If your tests are weak, the loop will pass weak tests.
Comprehension debt. If the agent ships code while you sleep, you wake up owning code you have not read. Two engineers can run the same loop and get opposite outcomes depending on whether they use it to accelerate understanding or avoid it.

The fix is not to avoid loops. It is to make the harness tight before you make the loop long.

7. What this means for you

If you build with AI today, your leverage is moving up one layer of abstraction:

This week: identify one recurring task with a clear pass/fail gate (a failing test, a stale report, a daily data pull).
This month: turn it into a deterministic loop using the scheduler and tools you already have. Hermes cron, Claude Code -p, or a simple GitHub Action all work.
This quarter: add a verifier agent for one subjective task — UI review, copy review, or architecture review — so the maker and checker are separated.

Start small. One scheduled triage automation plus one verifier agent captures most of the value for a fraction of the token cost of a fully autonomous agent.

OpenAI Faces a 42-State Subpoena as Safety Scrutiny Collides

FAQ

What is loop engineering in simple terms? It is the practice of building systems that prompt and supervise AI agents automatically, instead of typing every prompt yourself.

What is the cheapest way to start? Run a deterministic loop first. A cron job that checks a health endpoint and opens a worktree fix only on failure costs almost nothing when nothing is broken.

What is the biggest mistake teams make? Letting the loop run before the verification gate is trustworthy. A bad loop ships bad code faster.

Sources

Anthropic. "Claude Fable 5 and Claude Mythos 5." June 9, 2026. https://www.anthropic.com/news/claude-fable-5-mythos-5
Anthropic. "Higher usage limits for Claude and a compute deal with SpaceX." May 6, 2026. https://www.anthropic.com/news/higher-limits-spacex
Anthropic. Claude Code product page. https://www.anthropic.com/product/claude-code
ExplainX. "Claude Code 2.1.139 adds /goal command." May 12, 2026. https://explainx.ai/blog/claude-code-goal-command-long-running-agents-2026
Hermes Agent documentation. https://hermes-agent.nousresearch.com/docs/
Hermes Atlas. "Hermes Agent vs Claude Code." Updated April 2026. https://hermesatlas.com/guide/vs-claude-code
Janakiram MSV. "The Anthropic leader who built Claude Code says he ditched prompting — now he just writes loops." The New Stack, June 10, 2026. https://thenewstack.io/loop-engineering/
OpenClaw. "Agent loop." https://docs.openclaw.ai/concepts/agent-loop

Updates & Corrections

2026-06-16 — Article published. Pricing and availability facts are volatile; Anthropic suspended Claude Fable 5 / Mythos 5 access on June 12, 2026, following a US government export-control directive.

Loop Engineering: Why the Best AI Agents in 2026 Are Built as Loops, Not Prompts

1. What loop engineering actually means

2. Why this is happening now (and not six months ago)

3. The two loop types (and when to use each)

Deterministic loops: "I know what done looks like"

Non-deterministic loops: "A person has to judge this"

4. The six building blocks of a production loop

5. A practical loop you can build today

6. The hard parts no one talks about

7. What this means for you

FAQ

Get the practical AI brief

Discussion

Loop Engineering: Why the Best AI Agents in 2026 Are Built as Loops, Not Prompts

1. What loop engineering actually means

2. Why this is happening now (and not six months ago)

3. The two loop types (and when to use each)

Deterministic loops: "I know what done looks like"

Non-deterministic loops: "A person has to judge this"

4. The six building blocks of a production loop

5. A practical loop you can build today

6. The hard parts no one talks about

7. What this means for you

FAQ

Get the practical AI brief

Discussion

1. What loop engineering actually means

2. Why this is happening now (and not six months ago)

3. The two loop types (and when to use each)

Deterministic loops: "I know what done looks like"

Non-deterministic loops: "A person has to judge this"

4. The six building blocks of a production loop

5. A practical loop you can build today

6. The hard parts no one talks about

7. What this means for you

Related reading

FAQ

Get the practical AI brief

Discussion

1. What loop engineering actually means

2. Why this is happening now (and not six months ago)

3. The two loop types (and when to use each)

Deterministic loops: "I know what done looks like"

Non-deterministic loops: "A person has to judge this"

4. The six building blocks of a production loop

5. A practical loop you can build today

6. The hard parts no one talks about

7. What this means for you

Related reading

FAQ

Get the practical AI brief

Discussion