48 Lines. Still Wrong.

Building AI Autopilot for code, research, and workflows.

HLTM had a new problem after I cleaned up the YAML.

The agents were isolated. The state was simple. The orchestrator still ran on Claude.

And Claude kept making decisions where I needed determinism.

I wanted routing logic I could reason about. Not vibes from a model with a long context window.

The Router Idea

v5 had one insight worth keeping: the orchestrator shouldn’t think.

You are a ROUTER.
Do NOT explore code or do any discovery.
Phase skills handle everything.

That’s it. Read session.yaml. Call the right skill. Write the next phase. Loop.

No reasoning. No exploration. No “let me consider the best approach here.”

Just routing.

The autopilot went from something that tried to understand the project to something that only read a state file and called a function. The agents — brief, planning, implementation, review — handled everything else. They didn’t know about each other. They each got one reference file with their instructions and nothing else.

Cleaner. Much cleaner.

v6: Push It Further

v6 took the same idea and went harder.

The autopilot shrank from 103 lines to 48. Every piece of logic that didn’t belong in the router got extracted into its own file:

hltm-autopilot/
├── SKILL.md              ← 48 lines, pure FSM
└── references/
    ├── errors.md         ← how to handle failures
    ├── logging.md        ← what to print
    ├── result-handling.md← how to process skill output
    ├── scope.md          ← what autopilot is allowed to touch
    └── session.md        ← state file format

The autopilot didn’t know how to handle errors — it read errors.md. Didn’t know what to log — read logging.md. The skill itself was a dispatcher. All behavior lived in files.

This was a good idea.

It meant you could change how the system behaved without touching the orchestrator. Swap result-handling.md and the whole error recovery strategy changes. Swap the phase reference files and you’ve got a different methodology running on the same infrastructure.

What Was Still Wrong

The FSM transitions were probabilistic.

That’s the problem with using an LLM as a state machine. A real state machine reads a value and makes a deterministic decision. Claude read session.yaml, saw phase: implementation, and decided to call /hltm-implementation.

Usually right.

Sometimes it read the phase, looked at the context accumulated over the session, and made a different call. Skipped a phase. Reran one. Decided the project needed replanning.

Not because the state said so. Because it seemed reasonable.

One malformed JSON result from a skill and the transition broke silently. The orchestrator received partial output, tried to interpret it, guessed the next state. Moved on.

I didn’t know until three phases later when something was wrong and I had no idea why.

The Insight That Mattered

Everything extracted to reference files — that was right.

Methodology as a folder of markdown files. Swap the folder, run a different workflow on the same infrastructure.

The bash loop took exactly that idea.

hltm-loop -e claude-code:opus -p prompts/dev/develop.md -p prompts/dev/review.md

Same concept. Prompts are files. Swap them, different methodology. The runner doesn’t care what’s inside.

The difference: the runner is bash, not Claude. Bash reads a signal tag and routes. Deterministic. Zero probability of “deciding” to do something else.

The architecture of v6 was right.

The runtime was wrong.

Next: Every AI dev methodology I tested. None survived..