Sergey Kopanev: you sleep — agents ship

Go Back
Building AI Autopilot · Part 13

Natural Language Was the Problem. Six XML Tags Were the Fix.


Building AI Autopilot for code, research, and workflows.

The bash loop can’t parse intentions. It can only parse text.

HLTM is a personal automation layer that runs AI agents autonomously. A bash loop — hltm-loop.sh — spawns agents with fresh context every round. The goal: drop a brief, come back to working code.

The agent was writing: “I’ll now proceed to review the changes and verify correctness before moving to the next step…” — and the loop had no idea if it was done, starting, or stuck.

It was reading natural language and guessing.

Guessing is unreliable.

The Signal Problem

Every round, the bash loop needed to know one thing: what happened, and what happens next.

Natural language gave it everything except that.

“I’ve completed the implementation and believe the code is ready for review.” Is it done? Is it asking for review? Is the task closed? The loop had to infer. It inferred wrong often enough to matter.

The loop needed a contract. Not prose. A signal.

Six Tags

<loop:update>progress message</loop:update>
<loop:stage>review:task-123</loop:stage>
<loop:human>need input on X</loop:human>
<loop:failed>stuck after 3 attempts</loop:failed>
<loop:done>summary of what was completed</loop:done>

That’s it. The entire protocol.

Each tag has one job. No overlap. No ambiguity.

<loop:update> — still working, here’s where I am. <loop:stage> — move to a different stage, carry this task ID. <loop:human> — stop the loop, wait for a human. <loop:failed> — I’m stuck, escalate. <loop:done> — task complete, summarized.

The Rules Are Half the Protocol

The tags don’t work without two rules baked into every prompt:

Emit the tag. Stop immediately. Do not continue writing.

One signal per output. No commentary after the tag.

Without those rules, agents emit the tag then keep going. “I’ve completed the work. <loop:done> However, I also noticed a potential issue with…” The tag is now buried in prose. The loop reads it, acts on it, misses the caveat. The caveat mattered.

Stop is not optional. Stop is part of the signal.

Deterministic Routing

Natural language output means the loop has to guess what happened.

Structured tags mean deterministic routing.

<loop:stage>review:task-123</loop:stage> — bash reads that, switches to the review prompt, passes task-123 as context. No interpretation. No inference. One possible action.

<loop:human> blocks the loop entirely and waits. That’s the only way the agent asks for help. Not prose. Not “you might want to check…” A tag that actually stops execution.

<loop:failed> fires after three identical errors. The agent stops trying instead of looping forever on the same broken approach. Three strikes and it escalates. The loop handles the escalation, not the agent.

The Weaker Model Problem

Not every round runs on Claude.

For weaker models — GLM-4.7, smaller local models — the parser is tolerant. It accepts any closing </loop:*> tag as a valid fallback. The agent doesn’t need to get the exact tag name right. Close enough counts.

The prompts also use specific words to keep instructions unambiguous at any capability level. “Emit” for signals. “Run” for commands. Not interchangeable. Not up for interpretation.

A weaker model that can’t write clean prose can still emit a tag and stop.

Before and After

Before: the loop was reading agent output, pattern-matching for phrases like “task complete” or “ready for review”, inferring state. It got it right most of the time. Most of the time is not a protocol.

After: the loop reads one tag per round. Routes deterministically. Either it finds a tag and acts, or the output is malformed and the round fails visibly.

“Maybe done?” became a state machine.


Next: Works on my machine is not a passing test..