Calibration Fix.

My model was a pathological liar. It promised 90% accuracy and delivered 5% reality.

My model was a liar. It looked me in the eye and said: “This user has a 90% chance to buy!”

The Reality: They had a 5% chance.

My model was overconfident. It pushed all predictions to the extremes (0 or 1). It had no nuance.

The Problem: If I give a discount to users with “High Intent” (> 80%), and my model says everyone is > 80%, I go bankrupt.

The Overconfidence Problem

Example:

100 users with model score > 0.9 (“Very High Intent”).
Expected: 90 sales (if the model is honest).
Reality: 5 sales.

The model was screaming “BUY! BUY! BUY!” for users who were just browsing.

If I gave all 100 users a discount, I would:

Waste: 95 discounts on non-buyers.
Lose: $950 in margin.
Gain: 5 sales (maybe).

Not a good trade.

The Fix: Isotonic Regression

I needed to teach the model humility. I used Isotonic Regression. It maps the model’s raw scores to real probabilities.

How it works:

Take all predictions from the validation set.
Group them by score (e.g., all users with score 0.8-0.9).
Calculate the actual conversion rate for each group.
Build a lookup table: Raw Score → Calibrated Probability.

Example:

Before: Score 0.9 → Actual Probability 0.05 (Liar)
After: Score 0.9 → Actual Probability 0.88 (Honest)

The Result:

Calibration Error: Reduced by 99.5%.
Ranking Quality: Unchanged (still 0.96).

The model didn’t get “better” at ranking. It just got honest about its confidence.

Why Platt Scaling Failed

I tried Platt Scaling (Logistic Regression) first. It assumed the error was a nice, smooth curve (Sigmoid).

It wasn’t. My error was jagged and messy.

The Calibration Plot:

Ideal: A straight diagonal line (Predicted = Actual).
My Model (Before): A hockey stick (flat at 0, then jumps to 1).
Platt Scaling: Tried to fit a smooth S-curve. Failed. Destroyed the ranking quality.
Isotonic Regression: Fit the jagged mess perfectly. Preserved the ranking quality.

Isotonic Regression is “non-parametric.” It doesn’t assume a shape. It just fits the data. It worked perfectly.

The Lesson

Accuracy is vanity. Calibration is sanity.

A model with 0.90 prediction accuracy that lies about probabilities is dangerous. A model with 0.70 prediction accuracy that tells the truth is profitable.

Don’t trust the raw score. Make the model prove it.

Next: Position 40. Everything Collapsed..