A field guide

Stop prompting agents. Build the loop that prompts them.

In their words

You shouldn’t be prompting coding agents anymore. You should be designing loops that prompt your agents.

Peter Steinberger creator of OpenClaw

I don’t prompt Claude anymore. I have loops running. They’re the ones prompting Claude and figuring out what to do. My job is to write loops.

Boris Cherny creator of Claude Code

Where it sits

The newest floor in a four-story stack

Each layer makes the one below it more useful. Prompt, context, and harness make a single run better. The loop makes recurring work reliable — when the human is no longer in the inner loop.

PROMPT Prompt engineering: the words you give the model. What should I say to the model?

CONTEXT Context engineering (Karpathy): the state and knowledge it sees. What should the model see?

HARNESS The tools, permissions, tests, and sandbox around one agent run. What should surround one run?

LOOP The recurring system that fires, verifies, remembers, and decides. What system runs the work for me?

Read the full chapter on the lineage →

The shape of a loop

Seven steps, then a decision

A loop is not a cron job. A cron job repeats blindly. A loop discovers, verifies, persists, and decides what is next, then stops for an honest reason.

Trigger

A schedule or an event fires it. The heartbeat, not you.

Discover

Read CI failures, issues, alerts, or a task list to find the work.

Delegate

The maker proposes the smallest change that could be right.

Act

In an isolated worktree or sandbox, so parallel agents do not collide.

Verify

A separate checker grades the result. The maker does not get to mark its own work.

✖ fail → feed the evidence back and retry ✓ pass → persist state, then decide

Persist

Write done and next to disk. Read it first on the following turn.

Decide

Choose the next action, or stop: goal met, budget spent, stalled, needs a human.

A loop that cannot stop is a bug. Build the stop conditions first. They are what let you walk away.

Read the anatomy of a loop, with the 11-part contract →

What it is made of

Five building blocks, plus memory

The same parts show up whether you build in Claude Code or Codex. Only the names of the levers change.

Block	Its job	Claude Code	Codex
Automations	the heartbeat that fires the loop	/loop, /goal, hooks	Automations tab
Worktrees	isolate parallel agents	git worktree	worktree per thread
Skills	codify project knowledge	SKILL.md	SKILL.md
Connectors	touch real tools and data	MCP + plugins	Connectors (MCP)
Sub-agents	keep maker separate from checker	.claude/agents/	.codex/agents/
+ Memory	done and next, on disk	markdown, issue tracker	markdown, issue tracker

Read the chapter on all five blocks →

Build it headless

The command that makes a loop a loop

Run the agent non-interactively, with pre-approved tools and machine-readable output. A loop is just this command driven by something other than you — a while, a cron, a CI run — with a gate that skips when work is already green, and a cap.

/loop · repeat on a cadence

Run again on a schedule, roughly like cron. Useful, but it does not know whether it succeeded.

/goal · run until verified

Repeat until a verifiable condition is true, checked by a separate model each turn. The workhorse of serious loops.

# Claude Code: one-shot, machine-readable, pre-approved tools
claude -p "fix the cause of these failures; don't weaken tests" \
  --allowedTools "Read,Edit,Bash" --output-format json
  # returns the result and total_cost_usd

# Codex: read-only by default; opt up explicitly; JSONL + resume
codex exec --json "summarize the failing tests and fix the cause"
codex exec resume --last "now fix the race condition you found"

The verifier is the asset; the maker is a commodity. Makers improve for free with every model release. The encoded, executable definition of correct for us is the thing you actually own.

Read the hands-on Claude Code & Codex guide → · let an agent design one for you, with the skill →

Stay honest

What it asks of you, and what it costs

The point of a loop is to automate the inner loop — the per-turn prompting — while you stay the engineer at the outer loop: verification, comprehension, and intent.

The four honest stops

goal metthe verifier confirms the done-condition
budget spentan iteration, token, time, or dollar ceiling tripped
stalledthe same failure twice, with no new evidence
needs a humanhigh-risk or ambiguous, so it escalates — a success state, not a failure

Bake these into every maker

Fix only the cause. Do not widen scope.
Never weaken or delete a test to make it pass.
The maker does not declare itself done. A separate checker does.
Write progress to a file each turn; read it first the next turn.
Keep an explicit stop, so the loop can halt without you.

The two ceilings

Money. An N-step loop trends toward O(N²), because each turn re-bills the history. Anthropic reports agents use roughly 4× the tokens of chat, and multi-agent systems roughly 15×. Bound it with budgets, gating, caching the repeated context, compaction, and right-sized models.

You. The orchestration tax. You can only merge as many diffs as you can read. Scale the fleet to your review rate — usually a low single digit — not to the tool’s lane count.

The debt trilogy

Debt	Lives in	Agent can pay it down?
Technical	the code	Yes
Comprehension	people	Recoverable: ask it to explain
Intent (the why)	artifacts	No. It must come from a human

Watch for cognitive surrender: when the model’s output quietly becomes your answer and you stop forming an independent view.

The shortest version

Make the loop cheap to trust.

The separate checker, the disk-backed memory, the budgets, the gate, the honest stop — every part is in service of that one sentence. If a step does not make the loop’s output cheaper to trust, it is decoration.

Learn it in depth Build a loop for your workflow