Why our scoring model dropped 40 points after recalibration
A transparent product post about the most uncomfortable model adjustment we have ever shipped, and the reason it made the product better.
Last month we shipped a scoring model recalibration that dropped the average score across the system by roughly 40 points. A signal that used to read as a 92 now reads as a 51. We knew this would feel bad. We did it anyway. Here is the long version of why.
The original model
We launched the scoring model two years ago with calibration tuned against our earliest customers. That cohort, by accident, was heavily weighted to integrators serving warehouse and distribution. The model learned that big square footage and freight LLCs were excellent signals. It scored those highly. The team that shipped that calibration was right for the cohort at the time.
What changed
Two years of customer growth widened the customer base. We now have integrators in education, healthcare, cannabis, and a long tail of mixed verticals. The original calibration kept giving high scores to warehouse signals, even when the customer was a school district integrator who could not act on them. The dashboard looked busy. The pipeline did not move.
Why the drop
- We recalibrated against the broader customer base, not the founding cohort
- We tightened the trigger window weighting from 120 days to 60 days, which dropped many older signals out of the upper band
- We added a hard penalty for accounts where we could not identify a probable buyer within two layers of the org
- We stopped giving credit for industry fit alone. Industry fit is necessary but not sufficient. Real triggers are sufficient
The hard part
The hard part of shipping this was the customer communication. Reps who had been celebrating 90+ scores for a year suddenly saw 50s. We spent a week answering the same question. Did the leads get worse. The honest answer is the leads did not get worse. The number got more honest. The reps who trusted the change saw their conversion per high-scored lead go up, because the high-scored leads were now actually high.
Inflated scoring feels good on the dashboard. It is silently making your reps worse, because they are working signals that do not deserve their time.
What we learned
Recalibrate against the current customer base, not the founding one. Tighten time windows aggressively. Penalize for missing buyer paths, do not just credit for industry fit. And when the model gets more honest, the score will drop. That is not a bug. That is the model finally telling the truth.
“An honest score below 60 is more useful than an inflated score above 90. The reps will adjust. The dashboard will look quieter. Pipeline will go up.”
If this resonated, it'll feel familiar in the product.
Try Blacksmith against your real territory for 14 days. No card, no metered AI credits, no surprises.
Summit seats are limited and allocated to qualifying founding members. Perks subject to final terms.