Three Sequence Models. All Failed.
I asked deep learning to predict intent. Deep learning said: “lol no.”
I trained three different neural networks to predict who would buy. All three agreed: “We have absolutely no fucking clue.”
I thought: “Maybe my model is too simple.” So I built a sequence model. Then I built an sequence model. Then I built a bidirectional sequence model.
The result:
- sequence model: 0.52 prediction accuracy (Random)
- sequence model: 0.54 prediction accuracy (Random)
- bidirectional sequence model: 0.57 prediction accuracy (Slightly less random. Still useless)
In a funnel with 1.5% conversion, 0.57 prediction accuracy is indistinguishable from chaos. I threw the entire deep learning textbook at the problem. The problem didn’t care.
The Architecture Wars
I spent a week tuning hyperparameters. I argued with myself about “forget gates” and “hidden states.” I spent more time tuning forget-gate decay than fixing my damn funnel.
The Logic:
- sequence model: Fast, efficient. Expected to capture short-term patterns.
- sequence model: Good at long-term dependencies. Expected to remember the “Quiz” even at the “Paywall.”
- bidirectional sequence model: Reads the sequence forward and backward. Expected to understand the “whole context.”
The Reality: They all learned the same thing: nothing.
The “Grammar” Trap
Sequence models (like GPT or sequence models) are built for language. Language has grammar.
- “The cat” is usually followed by a verb.
- “If” is usually followed by “then.”
Funnels don’t have grammar. They barely have logic. Some users go 3 → 4 → 5. Some go 3 → 17 → 2 → rage-exit.
There is no “syntax” to a panic-click. My models were trying to learn the grammar of a drunk person.
The Clustering Paradox (Again)
Just like before, the models learned structure.
- They grouped “long sessions” together.
- They grouped “short sessions” together.
They couldn’t tell the difference between:
- A long session because the user is interested.
- A long session because the user is confused.
Some “long sessions” were users reading carefully. Others were users stuck on the weight slider.
To the model, “time spent” is just a number. To a founder, “time spent” is either engagement or friction. The model couldn’t tell the difference.
The Lesson
Complexity is not a solution.
I thought a more complex model (bidirectional sequence model) would find the signal that the simple model (sequence model) missed. If the signal isn’t in the sequence, no amount of complexity will find it.
I was digging a hole in the wrong place. I wasn’t solving the problem. I was industrializing the mistake.