ProductAIMay 19, 2026·8 min read

Why our scoring model dropped 40 points after recalibration

A transparent product post about the most uncomfortable model adjustment we have ever shipped, and the reason it made the product better.

Dana Whitfield

Key Account Manager

Rain streaks on a window with city lights in the distance.

Last month we shipped a scoring model recalibration that dropped the average score across the system by roughly 40 points. A signal that used to read as a 92 now reads as a 51. We knew this would feel bad. We did it anyway. Here is the long version of why.

The original model

We launched the scoring model two years ago with calibration tuned against our earliest customers. That cohort, by accident, was heavily weighted to integrators serving warehouse and distribution. The model learned that big square footage and freight LLCs were excellent signals. It scored those highly. The team that shipped that calibration was right for the cohort at the time.

What changed

Two years of customer growth widened the customer base. We now have integrators in education, healthcare, cannabis, and a long tail of mixed verticals. The original calibration kept giving high scores to warehouse signals, even when the customer was a school district integrator who could not act on them. The dashboard looked busy. The pipeline did not move.

Why the drop

We recalibrated against the broader customer base, not the founding cohort
We tightened the trigger window weighting from 120 days to 60 days, which dropped many older signals out of the upper band
We added a hard penalty for accounts where we could not identify a probable buyer within two layers of the org
We stopped giving credit for industry fit alone. Industry fit is necessary but not sufficient. Real triggers are sufficient

The hard part

The hard part of shipping this was the customer communication. Reps who had been celebrating 90+ scores for a year suddenly saw 50s. We spent a week answering the same question. Did the leads get worse. The honest answer is the leads did not get worse. The number got more honest. The reps who trusted the change saw their conversion per high-scored lead go up, because the high-scored leads were now actually high.

Inflated scoring feels good on the dashboard. It is silently making your reps worse, because they are working signals that do not deserve their time.

What we learned

Recalibrate against the current customer base, not the founding one. Tighten time windows aggressively. Penalize for missing buyer paths, do not just credit for industry fit. And when the model gets more honest, the score will drop. That is not a bug. That is the model finally telling the truth.

“An honest score below 60 is more useful than an inflated score above 90. The reps will adjust. The dashboard will look quieter. Pipeline will go up.”

A city skyline at night with mixed warm and cool lighting.

May 20, 2026

Looking ahead: what we are betting on for 2027

Read

A row of books standing on a wooden library shelf.

May 12, 2026

Federal grants for school security and deadlines you cannot miss

Read

If this resonated, it'll feel familiar in the product.

Try Blacksmith against your real territory for 14 days. No card, no metered AI credits, no surprises.

Founding 250 · seats claimed0 / 250

Reserve your seat About us

Summit seats are limited and allocated to qualifying founding members. Perks subject to final terms.