Calibration Fix.
My model was a pathological liar. It promised 90% accuracy and delivered 5% reality.
My model was a liar. It looked me in the eye and said: “This user has a 90% chance to buy!”
The Reality: They had a 5% chance.
My model was overconfident. It pushed all predictions to the extremes (0 or 1). It had no nuance.
The Problem: If I give a discount to users with “High Intent” (> 80%), and my model says everyone is > 80%, I go bankrupt.
The Overconfidence Problem
Example:
- 100 users with model score > 0.9 (“Very High Intent”).
- Expected: 90 sales (if the model is honest).
- Reality: 5 sales.
The model was screaming “BUY! BUY! BUY!” for users who were just browsing.
If I gave all 100 users a discount, I would:
- Waste: 95 discounts on non-buyers.
- Lose: $950 in margin.
- Gain: 5 sales (maybe).
Not a good trade.
The Fix: Isotonic Regression
I needed to teach the model humility. I used Isotonic Regression. It maps the model’s raw scores to real probabilities.
How it works:
- Take all predictions from the validation set.
- Group them by score (e.g., all users with score 0.8-0.9).
- Calculate the actual conversion rate for each group.
- Build a lookup table:
Raw Score → Calibrated Probability.
Example:
- Before: Score 0.9 → Actual Probability 0.05 (Liar)
- After: Score 0.9 → Actual Probability 0.88 (Honest)
The Result:
- Calibration Error: Reduced by 99.5%.
- Ranking Quality: Unchanged (still 0.96).
The model didn’t get “better” at ranking. It just got honest about its confidence.
Why Platt Scaling Failed
I tried Platt Scaling (Logistic Regression) first. It assumed the error was a nice, smooth curve (Sigmoid).
It wasn’t. My error was jagged and messy.
The Calibration Plot:
- Ideal: A straight diagonal line (Predicted = Actual).
- My Model (Before): A hockey stick (flat at 0, then jumps to 1).
- Platt Scaling: Tried to fit a smooth S-curve. Failed. Destroyed the ranking quality.
- Isotonic Regression: Fit the jagged mess perfectly. Preserved the ranking quality.
Isotonic Regression is “non-parametric.” It doesn’t assume a shape. It just fits the data. It worked perfectly.
The Lesson
Accuracy is vanity. Calibration is sanity.
A model with 0.90 prediction accuracy that lies about probabilities is dangerous. A model with 0.70 prediction accuracy that tells the truth is profitable.
Don’t trust the raw score. Make the model prove it.