99.5% Accuracy. Zero Usefulness.

I built a model that understood English perfectly. It just didn’t understand money.

I had a brilliant idea: use a massive pre-trained language model to understand user behavior. It’s read billions of sentences. It understands context. If it can understand “The cat sat on the mat,” surely it can understand “Quiz → Weight Goal → Paywall.”

The result: It worked perfectly. 99.5% relative performance. The catch: It was completely useless.

The Experiment

I took a giant model (trained on the entire internet) and fed it my funnel data. I compared it to a custom model I trained from scratch on my 44,000 users.

The Logic:

Custom Model: Knows my funnel perfectly. Knows nothing else.
Pre-trained Giant: Knows English perfectly. Never saw my funnel.

The Test: Can the Giant figure out my funnel without training?

The Result

Model	Relative Accuracy	Prediction Quality
Custom Model	100%	0.54
Pre-trained Giant	99.5%	0.54

Incredible! The Giant matched the Custom Model almost perfectly. Transfer learning works! The domain gap is zero!

The Problem: 0.54 is Garbage

There was just one small detail. 0.54 prediction accuracy is random guessing.

A coin flip is 0.50. My “perfect” transfer learning model was barely better than guessing “heads” for everyone.

The reality:

Custom model: Useless.
Pre-trained model: Equally useless.
Transfer learning: Successfully transferred the uselessness.

Why It Failed

I spent days debugging. Was it the architecture? The size? The data format?

Then I looked at the business reality.

Language models look at order. “The cat sat on the mat.” If you change the order, the meaning changes.

Buyers look at timing. In a 60-screen funnel, the order is fixed. Everyone sees Screen 1, then Screen 2, then Screen 3. The signal isn’t in the order. It’s in the hesitation.

User A: Rushes through. (Curious).
User B: Pauses. Thinks. Clicks. (Desperate).

The Language Model saw “Screen 1 → Screen 2”. It ignored the pause. It understood the sentence perfectly. The sentence didn’t contain the answer.

The Lesson

High quality on the wrong task is useless.

I built a system that was 10x heavier and 10x slower to achieve the exact same failure.

I proved that transfer learning works for funnel sequences. I also proved that funnel sequences don’t predict purchases.

Sometimes you climb the wrong mountain really, really fast.

Next: Perfect Clustering Becomes Zero Prediction..