User Intent Prediction #3: 99.5% Accuracy. Zero Usefulness.
I built a model that understood English perfectly. It just didn’t understand money.
I had a brilliant idea: use a massive pre-trained language model to understand user behavior. It’s read billions of sentences. It understands context. If it can understand “The cat sat on the mat,” surely it can understand “Quiz → Weight Goal → Paywall.”
The result: It worked perfectly. 99.5% relative performance. The catch: It was completely useless.
The Experiment
I took a giant model (trained on the entire internet) and fed it my funnel data. I compared it to a custom model I trained from scratch on my 44,000 users.
The Logic:
- Custom Model: Knows my funnel perfectly. Knows nothing else.
- Pre-trained Giant: Knows English perfectly. Never saw my funnel.
The Test: Can the Giant figure out my funnel without training?
The Result
| Model | Relative Accuracy | Prediction Quality |
|---|---|---|
| Custom Model | 100% | 0.54 |
| Pre-trained Giant | 99.5% | 0.54 |
Incredible! The Giant matched the Custom Model almost perfectly. Transfer learning works! The domain gap is zero!
The Problem: 0.54 is Garbage
There was just one small detail. 0.54 prediction accuracy is random guessing.
A coin flip is 0.50. My “perfect” transfer learning model was barely better than guessing “heads” for everyone.
The reality:
- Custom model: Useless.
- Pre-trained model: Equally useless.
- Transfer learning: Successfully transferred the uselessness.
Why It Failed
I spent days debugging. Was it the architecture? The size? The data format?
Then I looked at the business reality.
Language models look at order. “The cat sat on the mat.” If you change the order, the meaning changes.
Buyers look at timing. In a 60-screen funnel, the order is fixed. Everyone sees Screen 1, then Screen 2, then Screen 3. The signal isn’t in the order. It’s in the hesitation.
- User A: Rushes through. (Curious).
- User B: Pauses. Thinks. Clicks. (Desperate).
The Language Model saw “Screen 1 → Screen 2”. It ignored the pause. It understood the sentence perfectly. But the sentence didn’t contain the answer.
The Lesson
High quality on the wrong task is useless.
I built a system that was 10x heavier and 10x slower to achieve the exact same failure.
I proved that transfer learning works for funnel sequences. I also proved that funnel sequences don’t predict purchases.
Sometimes you climb the wrong mountain really, really fast.
This leads directly to why 32 dimensions compressed to 8 and broke everything—when the signal isn’t there, no amount of math will find it.