Validation

Post-opening validation: how to know if a site selection model was right

Selecting a site is half the work. The other half happens after the doors open, when you can check whether the forecast and the score held up against real sales. Most expansion teams skip that step, and the model never gets a chance to prove itself or improve.

Quick answer

Post-opening validation compares what your model predicted against what a new site actually did. You measure forecast error with a metric like MAPE, track the real sales ramp against the predicted curve, and check whether the site score lined up with performance. The misses show you which weights, analogs, and assumptions to recalibrate, so the next forecast comes in sharper.

The missing loop: why validation matters

Most expansion teams put real effort into the decision and almost none into the postmortem. A site gets scored, a forecast gets signed off, the lease gets approved, and attention moves to the next deal. A year later, nobody goes back to ask whether the number the model produced came anywhere near the truth.

That gap costs money. A model nobody checks cannot improve, and it drifts quietly as markets, competitors, and your own network shift around it. Validation closes the loop and gives you grounds to defend a scorecard with evidence rather than habit.

Closing the loop takes discipline more than new math. The prediction is already on record. Once the site has traded long enough to mean something, you set it next to reality and write down what you learned.

What to measure

  • Forecast versus actual is the headline comparison. Put the predicted first-year sales or volume next to what the site really did, and read the signed gap, which tells you the direction of the error, hot or cold, on top of its size.
  • Mean absolute percentage error, or MAPE, collapses the gap into one comparable number. Tracked across a cohort of openings, it shows how accurate the model is on average and whether recalibration is moving that accuracy.
  • Ramp versus predicted matters because new stores do not open at maturity. Compare the actual month-by-month ramp to the curve you assumed, since a site can reach its mature number while taking far longer or shorter to get there, and that timing reshapes the pro forma.
  • Score versus performance steps back from the dollar figure to ask whether the explainable site score ranked locations in the right order. When the high-scoring sites outperform the low-scoring ones, the model is sorting deals well even where a single point estimate misses.

Choosing comparison windows

When you measure matters as much as what you measure. The first weeks of any opening run hot, with a grand opening, local press, curiosity traffic, and a promotion calendar that flatters the totals. Judge a site on that window and every location reads like a winner.

Wait for the opening rush to fade and the site to settle into a steady run rate. A trailing window of stabilized weeks, well after launch, gives a fair read on the mature number the forecast was trying to predict. Apply the same window to every site so comparisons stay consistent, and keep the opening period as a separate ramp observation instead of folding it into the verdict.

Cohort and analog tracking

One opening proves little on its own. A group of them can prove a pattern. Group your openings by vintage, by format, or by market type, and watch how error trends across the set. A single site that missed badly might be a fluke, while ten sites that all came in under forecast is something the model needs to absorb.

Watch the analogs closely. Most forecasts lean on comparable existing locations to predict a new one. After opening, check whether the analogs you chose actually behaved like the new site. Where they did, lean on that analog set harder. Where they did not, the matching logic needs work, well beyond the weights.

Recalibrating the model

Validation only pays off when it feeds back into the model. Once you can see where the misses cluster, three levers are available, and the data points to whichever one is loose.

  • Weights come first when high-scoring sites underperform low-scoring ones. Adjust how much each component of the score counts until the ranking matches what the cohort actually did.
  • Analogs are the issue when the comparable sites you leaned on did not behave like the new opening. Tighten the matching with a different trade-area shape, different competition, different demographics, or a cleaner set of peers.
  • Assumptions are the fix when the mature number came in close but the timing was off. Repair the ramp curve and the seasonality assumptions rather than the site model, since the trade area was often right and only the calendar was wrong.

Recalibrate deliberately, one lever at a time, so you can tell what actually moved the accuracy. A model that swings hard after every opening is overfitting to the last data point rather than learning from the cohort.

Recording what changed

Every recalibration should leave a paper trail. Keep a decision record for each validated site: the prediction, the actual, the error, and the specific change you made to the model in response. Note the comparison window and the data vintages so the next person can reproduce the read instead of relitigating it.

Documentation is also where confidence comes from. When a committee asks why the model deserves trust, the answer stops being a feeling and becomes a history of forecasts checked against actuals and weights adjusted on evidence. A site brief lands harder when the model behind it carries a track record you can point to.

Turning misses into improvements

Treat every miss as raw material. A gap between forecast and actual is the clearest signal you have about what the model failed to see. The teams that end up with trustworthy models treat each gap as a question worth answering about what got overlooked and how to teach the model to catch that pattern next time.

Run validation on a steady cadence, fold the lessons back into the weights and analogs, and the error trend bends in the right direction over time. Accuracy earned opening by opening is what separates a model people quietly distrust from one they will back with real capital.

What each metric tells you, and what to do about it

What each metric tells you, and what to do about it
MetricWhat it tells youAction
Forecast vs actual (signed gap)Whether the model runs hot or cold, and by how muchCorrect a consistent bias before touching individual weights
MAPE across a cohortAverage accuracy and whether it is improvingSet a target and track it opening over opening
Ramp vs predictedWhether the timing to maturity matched the curveFix the ramp and seasonality assumptions, not the site model
Score vs performanceWhether the score ranked sites in the right orderRetune factor weights so high scores outperform low scores
Analog behaviorWhether your comparable sites acted like the new oneTighten the analog matching or replace weak peers

Frequently asked questions

How long after opening should I wait to validate a site?
Wait for the opening spike to fade and the site to reach a steady run rate, then read a trailing window of stabilized weeks. Use the same window for every location so comparisons stay consistent, and log the ramp period as a separate observation.
What is MAPE and why use it?
MAPE is mean absolute percentage error, the average size of the gap between forecast and actual stated as a percentage. It turns many openings into one comparable accuracy number, so you can tell whether recalibration is making the model better.
What if the forecast missed but the site score was right?
That split is useful. If high-scoring sites still outperform low-scoring ones, the model is ranking deals correctly even where a dollar forecast misses. Fix the point estimate and the ramp assumptions, and keep trusting the order the score produced.
How often should the model be recalibrated?
On a steady cadence as openings accumulate, not after every store. Move one lever at a time, weights, analogs, or assumptions, so you can see what shifted accuracy. Frequent wild swings mean the model is overfitting to the last data point.
Does Geod support this kind of validation?
Geod produces explainable scores, drive-time trade areas, and decision records, which give you the predicted values and documented assumptions to compare against actuals. The validation loop stays the same whatever the tool: record the prediction, wait for a fair window, measure the gap, and recalibrate.

Related resources

Pilot program

See Geod on your next location

Geod is in a pilot program right now. Book a short walkthrough and we will score a candidate location with you: an explainable score, a drive-time trade area, competition, cannibalization, and a site brief.

Prefer the method first? Read the Geod methodology.