Claude Gets Expensive. Codex Gets Killed. The Loop Doesn't Care.

Building AI Autopilot for code, research, and workflows.

I’m building HLTM — a personal automation layer that runs AI agents on projects autonomously. The goal: drop a brief, walk away, come back to working code. The loop runs the agents. The loop doesn’t care which agents.

That was a design choice. It paid off immediately.

Model pricing changed midweek. Product roadmaps changed with it.

I did not want the system architecture to change every time a vendor sneezed.

The Interface

hltm-loop -e claude-code:opus -p prompts/dev/develop.md -n 10
hltm-loop -e codex:o3 -p prompts/dev/develop.md -n 10
hltm-loop -e opencode:glm-4.7 -p prompts/dev/develop.md -n 10

Same loop. Same prompts. Different executor. Zero code changes.

The -e flag is the executor. The -p flag is the prompt folder. That’s the whole interface between the loop and the model.

What Happened in One Week

Claude Opus pricing went up.

Swapped Opus to Sonnet for develop rounds. Kept Opus for review only — review is where you want the stronger model. One flag change. No refactoring.

# Before
hltm-loop -e claude-code:opus -p prompts/dev/develop.md -n 10

# After
hltm-loop -e claude-code:sonnet -p prompts/dev/develop.md -n 8
hltm-loop -e claude-code:opus -p prompts/dev/review.md -n 2

Cost dropped. Quality held.

Then Codex got deprecated.

Swapped -e codex:o3 to -e claude-code:sonnet. Ten seconds. Zero refactoring.

GLM-4.7

GLM-4.7 is a model from Zhipu AI. Chinese company. Cheaper per token than Claude for bulk tasks.

The loop ran it without modification.

Same prompt folder. Same signal protocol. Same bash routing. Different model, different API, different continent.

Worked first try.

That shouldn’t be surprising — the loop doesn’t know anything about the model. It spawns a process with a prompt and waits for a signal tag. GLM-4.7 can emit a signal tag. The loop routes it.

The Signal Protocol

The contract between any model and the loop is four XML tags.

<loop:stage next="review">
<loop:stage next="develop">
<loop:done>
<loop:blocked reason="...">

That’s it. Any model that outputs one of those tags in its response gets routed correctly.

Weaker models sometimes output malformed XML. The loop has a tolerant parser — strips whitespace, handles missing quotes, matches partial tags as fallback. The protocol is strict enough to be unambiguous, lenient enough to handle real model output.

Stronger models emit clean tags. Weaker models emit messier ones. The loop handles both.

Methodology Is Also a Flag

The -p flag points to a folder of prompt files.

hltm-loop -e claude-code:sonnet -p prompts/dev/ -n 10      # development workflow
hltm-loop -e claude-code:sonnet -p prompts/debug/ -n 5     # debugging workflow
hltm-loop -e claude-code:sonnet -p prompts/refactor/ -n 8  # refactor workflow

Swap the folder. Different workflow. Same infrastructure.

BMAD was a folder of prompts. GSD was a folder with one prompt. My two-phase loop is a folder with two prompts. The loop doesn’t know the difference between any of them.

This is what v6’s reference file architecture was pointing at — methodology as files, not as hardcoded logic. The bash loop took that idea all the way.

The Design Principle

Loop is infrastructure. Prompts are policy. Keep them separate.

The moment you hardcode a tool into your orchestrator, you’ve built vendor lock. Not just to the model — to the model’s API format, its tool protocol, its output conventions.

The moment the loop knows what Claude’s tool calls look like, you can’t swap Claude.

The loop knows one thing: there’s a process to spawn, a prompt to give it, a signal to wait for. That’s a stable interface. Everything else is an implementation detail that belongs in the executor or the prompt, not in the loop.

The Practical Result

Three different models in one week. No architectural changes.

Claude Sonnet for development. Claude Opus for review. GLM-4.7 for bulk generation tasks where quality requirements are lower. Codex for a legacy project that had existing Codex prompts.

All of them ran on the same 300 lines of bash.

The loop didn’t care. Neither did I.

Next: Natural language was the problem. Six XML tags were the fix..