Sergey Kopanev - Entrepreneur & Systems Architect

Go Back

User Intent Prediction #14: Calibration Fix.


My model was a pathological liar. It promised 90% accuracy and delivered 5% reality.

My model was a liar. It looked me in the eye and said: “This user has a 90% chance to buy!”

The Reality: They had a 5% chance.

My model was overconfident. It pushed all predictions to the extremes (0 or 1). It had no nuance.

The Problem: If I give a discount to users with “High Intent” (> 80%), and my model says everyone is > 80%, I go bankrupt.

The Overconfidence Problem

Example:

  • 100 users with model score > 0.9 (“Very High Intent”).
  • Expected: 90 sales (if the model is honest).
  • Reality: 5 sales.

The model was screaming “BUY! BUY! BUY!” for users who were just browsing.

If I gave all 100 users a discount, I would:

  • Waste: 95 discounts on non-buyers.
  • Lose: $950 in margin.
  • Gain: 5 sales (maybe).

Not a good trade.

The Fix: Isotonic Regression

I needed to teach the model humility. I used Isotonic Regression. It maps the model’s raw scores to real probabilities.

How it works:

  1. Take all predictions from the validation set.
  2. Group them by score (e.g., all users with score 0.8-0.9).
  3. Calculate the actual conversion rate for each group.
  4. Build a lookup table: Raw Score → Calibrated Probability.

Example:

  • Before: Score 0.9 → Actual Probability 0.05 (Liar)
  • After: Score 0.9 → Actual Probability 0.88 (Honest)

The Result:

  • Calibration Error: Reduced by 99.5%.
  • Ranking Quality: Unchanged (still 0.96).

The model didn’t get “better” at ranking. It just got honest about its confidence.

Why Platt Scaling Failed

I tried Platt Scaling (Logistic Regression) first. It assumed the error was a nice, smooth curve (Sigmoid).

It wasn’t. My error was jagged and messy.

The Calibration Plot:

  • Ideal: A straight diagonal line (Predicted = Actual).
  • My Model (Before): A hockey stick (flat at 0, then jumps to 1).
  • Platt Scaling: Tried to fit a smooth S-curve. Failed. Destroyed the ranking quality.
  • Isotonic Regression: Fit the jagged mess perfectly. Preserved the ranking quality.

Isotonic Regression is “non-parametric.” It doesn’t assume a shape. It just fits the data. It worked perfectly.

The Lesson

Accuracy is vanity. Calibration is sanity.

A model with 0.90 prediction accuracy that lies about probabilities is dangerous. A model with 0.70 prediction accuracy that tells the truth is profitable.

Don’t trust the raw score. Make the model prove it.


This concludes the “Pivot” of November. Next month, we face the final boss: Position 40.