User Intent Prediction #6: Three Sequence Models. All Failed.
I asked deep learning to predict intent. Deep learning said: “lol no.”
I trained three different neural networks to predict who would buy. All three agreed: “We have absolutely no fucking clue.”
I thought: “Maybe my model is too simple.” So I built a sequence model. Then I built an sequence model. Then I built a bidirectional sequence model.
The result:
- sequence model: 0.52 prediction accuracy (Random)
- sequence model: 0.54 prediction accuracy (Random)
- bidirectional sequence model: 0.57 prediction accuracy (Slightly less random, but still useless)
In a funnel with 1.5% conversion, 0.57 prediction accuracy is indistinguishable from chaos. I threw the entire deep learning textbook at the problem. The problem didn’t care.
The Architecture Wars
I spent a week tuning hyperparameters. I argued with myself about “forget gates” and “hidden states.” I spent more time tuning forget-gate decay than fixing my damn funnel.
The Logic:
- sequence model: Fast, efficient. Should capture short-term patterns.
- sequence model: Good at long-term dependencies. Should remember the “Quiz” even at the “Paywall.”
- bidirectional sequence model: Reads the sequence forward and backward. Should understand the “whole context.”
The Reality: They all learned the same thing: nothing.
The “Grammar” Trap
Sequence models (like GPT or sequence models) are built for language. Language has grammar.
- “The cat” is usually followed by a verb.
- “If” is usually followed by “then.”
Funnels don’t have grammar. They barely have logic. Some users go 3 → 4 → 5. Some go 3 → 17 → 2 → rage-exit.
There is no “syntax” to a panic-click. My models were trying to learn the grammar of a drunk person.
The Clustering Paradox (Again)
Just like before, the models learned structure.
- They grouped “long sessions” together.
- They grouped “short sessions” together.
But they couldn’t tell the difference between:
- A long session because the user is interested.
- A long session because the user is confused.
Some “long sessions” were users reading carefully. Others were users stuck on the weight slider.
To the model, “time spent” is just a number. To a founder, “time spent” is either engagement or friction. The model couldn’t tell the difference.
The Lesson
Complexity is not a solution.
I thought a more complex model (bidirectional sequence model) would find the signal that the simple model (sequence model) missed. But if the signal isn’t in the sequence, no amount of complexity will find it.
I was digging a hole in the wrong place. I wasn’t solving the problem. I was industrializing the mistake.
This leads to why looking backward and forward didn’t help—adding more direction to the noise just creates more noise.