Scoring

Site selection scoring models: building a defensible weighted scorecard

A site score is only useful if every point can be explained. This is a practical guide to building a weighted scorecard a real estate committee can challenge, adjust, and approve, rather than a black box they have to take on faith.

Quick answer

A site selection scoring model ranks how well each candidate fits your strategy by combining weighted criteria into a single number. A defensible one shows the weights, the contribution of each factor, the data behind it, and its vintage and confidence, so a real estate committee can see why a site scored what it did and adjust the model before they approve a deal.

What a site selection scoring model is

A site selection scoring model is a structured way to rate candidate locations against the things that drive performance for your concept. You define a set of criteria, give each one a weight that reflects how much it matters, score every site on each criterion, and combine those into a single number you can rank. A 70 means a site lines up better with your strategy than a 55, by the rules you set in advance.

The model does not promise that a store will succeed. It tells you which sites deserve a closer look and a forecast, and which ones can come off the list early. That is a useful job on its own, as long as the number behind it can be opened up and questioned.

Score, forecast, overlay, and recommendation are four different things

These four often get collapsed into one figure, and that is where committee arguments start. A score ranks fit. A forecast sizes performance, usually as a sales or revenue range. Overlays add the context a fit score leaves out, including cannibalization against your own stores, market saturation, site feasibility, and how confident the model is in its inputs. A recommendation is what you do with all of it: approve, reject, send back for research, or revise the deal.

Keeping them separate lets a reviewer agree that a site fits while still questioning whether the forecast holds or whether it would eat into a neighbor. When they are mashed together, the only thing left to argue with is a single digit, and nobody can tell which part of it they disagree with. The table below lays out what each artifact measures and what it decides.

Score vs forecast vs overlay vs recommendation

Score vs forecast vs overlay vs recommendation
ArtifactWhat it measuresWhat it decides
ScoreHow well a site fits your strategy, as weighted criteria rolled into one numberRanking: which candidates rise to the top of the pipeline
ForecastExpected performance, such as a sales or revenue range for the siteSizing: whether the upside clears your hurdle
OverlayContext the score leaves out: cannibalization, saturation, feasibility, confidenceAdjustment: where the ranking needs a second look
RecommendationThe combined read across score, forecast, and overlaysAction: approve, reject, research, or revise

The core components: Reach, Demand, Competition, Accessibility

Durable scorecards group their criteria under a few components, and Geod organizes them into four. Reach covers how many people the site can actually draw from, measured against a drive-time or walk-time trade area instead of a flat radius. Demand asks whether those people match your customer, looking at income, household makeup, daytime population, and spending in your category. Competition weighs who else already serves that demand nearby and how strong they are. Accessibility covers how easy the site is to reach and use, including road access, visibility, turn-in, and parking.

In Geod the default split is Reach 30, Demand 30, Competition 25, and Accessibility 15. Those weights are a starting point, not a verdict. A concept that lives or dies on convenience will want Accessibility carrying more, while a destination format can afford to lean on Reach and Demand, and the model should let you make that choice explicitly.

How weights, thresholds, and gates work

Weights decide how much each component moves the final score, and they are where your strategy gets written down. A grocery-anchored concept might load Demand and Reach, while a convenience format leans on Accessibility and traffic. Change a weight and you change what kind of site rises to the top, which is exactly why the weights should be a deliberate decision rather than a default nobody looked at.

Gates work differently. A weight lets a strong factor make up for a weak one, so a great trade area can carry a mediocre access score. A gate refuses that trade. If a site falls below a hard minimum on something you will not compromise, such as a parking count or a co-tenant requirement, the gate caps or fails the score regardless of how well the site does elsewhere. Most teams use weights for the factors that trade off against each other, and gates for the few that are non-negotiable.

Why cannibalization and saturation belong in overlays

It is tempting to fold cannibalization and saturation into the score as one more criterion, so a site that overlaps an existing store simply scores lower. That hides the most important conversation on the table. Cannibalization depends on your footprint. It changes with which of your stores sit nearby and how much sales transfer you are willing to accept for the network, so the same parcel can be a clear win for one operator and a mistake for another.

Saturation behaves the same way. A market can be the right market and still be too full this year. Keeping both as overlays on top of the fit score lets a committee see a strong site and the transfer it would cause side by side, then decide whether the net gain is worth it. Buried inside a single number, that judgment quietly disappears into the weighting where no one can find it.

Keep feasibility and confidence out of the score

Feasibility and confidence cause similar trouble for different reasons. Feasibility, meaning whether the deal can actually be built and operated within your economics, belongs in a gate or a separate track rather than a few points shaved off a fit score. A site can fit your strategy and still be undeliverable on rent, zoning, or buildout, and you want that to stop the deal, not lower its grade by three points.

Confidence is a statement about the inputs. When demographics are current and competition is well mapped, the score rests on solid ground. When a key figure is stale or estimated, the same number means less. Folding confidence into the score punishes sites with thin data and rewards sites with rich data, which is backwards. Show confidence next to the score instead, so a reviewer can weigh how much to trust it.

Handling missing or low-confidence data

Real candidates always have gaps. A new corridor has no foot-traffic history. A rural trade area can have sparse demographic detail. Sometimes a competitor opened last month and has not reached the data yet, or a parcel sits on a boundary where two sources disagree. A scorecard needs a rule for these cases that does not quietly invent a number.

One honest approach marks the field as missing and lets the gap lower confidence rather than the score. Another uses a clearly labeled estimate with its source attached. A third holds the site for manual review until someone fills the gap. What you want to avoid is a default of zero, or a median quietly substituted in, that lands in the final number with no flag, because that turns a data problem into a scoring problem nobody can see. Geod attaches a vintage and a confidence level to its inputs, so a thin field shows up as lower confidence instead of a falsely precise score.

Set the weights before you score

The fastest way to lose a committee is to let the weights move after the scores are in. Once you can see that a favored site landed at 68, it is easy to nudge Accessibility up two points until it clears 70, and now the model serves the deal instead of the other way around. Lock the weights first, in a version everyone has agreed to, and score every candidate against that same rule.

If the weights genuinely need to change, change them deliberately and rescore the whole pipeline, so the comparison stays fair across every site. A scorecard that gets quietly retuned per deal stops being a model and turns into a rationalization with a number attached, which is the one thing a real estate committee is right to distrust.

Calibrate against your own openings

A scoring model earns trust when it agrees with the reality you already know. Before you rely on it, run your existing stores and a few sites you passed on through the scorecard. Your strong performers should score well, your weak ones should not, and the failures you avoided should read as failures.

When a known winner scores low, treat it as a signal to revisit a weight or a missing input, rather than a result to override by hand. This back-testing is also the most honest way to win over a skeptic on the committee, because you can show that the model would have ranked the last round of openings close to the way the numbers actually came out.

What a defensible scorecard should show

A scorecard you can take into a committee and defend shows its work at every step. At a minimum, make these visible:

  • The criteria and the weight on each one, fixed before any site is scored.
  • The contribution each factor made to the final number, so a reviewer can see what carried the site and what held it back.
  • The data behind every input, with its source and how recent it is.
  • A confidence level that reflects how many fields are missing or estimated.
  • The cannibalization, saturation, and feasibility overlays, kept beside the score rather than mixed into it.
  • The model version, so two sites scored weeks apart stay comparable.

When all of that is on the page, the score becomes something a committee can challenge, adjust, and approve on the record. Geod presents the weighted contributions, source vintages, confidence, and overlays together, and exports them as a brief you can hand to reviewers. For the longer methodology behind an explainable model, the Geod blog covers site selection scoring models in depth.

Frequently asked questions

What is the difference between a site selection score and a sales forecast?
A score ranks how well a site fits your strategy, using weighted criteria rolled into one number. A forecast estimates how much that site is likely to sell, usually as a range. Use the score to shortlist candidates, then forecast the ones that make the cut.
How should weights be set in a scoring model?
Set weights to reflect what drives your concept, agree on them before scoring, and lock the version. Geod starts at Reach 30, Demand 30, Competition 25, and Accessibility 15, and lets you adjust. If you change weights, rescore the whole pipeline so every site is judged the same way.
Why keep cannibalization and saturation out of the score?
Both depend on your existing network and how much sales transfer you will accept, so the same parcel can be fine for one operator and a problem for another. As overlays they sit next to the fit score, letting a committee weigh a strong site against the transfer it would cause.
How do you handle missing or low-confidence data in a scorecard?
Avoid letting a gap become a silent zero or a quiet median in the final number. Mark the field as missing and let it lower confidence, use a labeled estimate with its source, or hold the site for manual review. Geod attaches a vintage and confidence to each input.
How do you keep a scoring model from being gamed?
Lock the weights before you see any scores, and apply the same version to every candidate. If a favored site falls short, revisit the model deliberately and rescore the pipeline rather than nudging one weight to clear a threshold. Calibrating against past openings is the strongest check.

Related resources

Pilot program

See Geod on your next location

Geod is in a pilot program right now. Book a short walkthrough and we will score a candidate location with you: an explainable score, a drive-time trade area, competition, cannibalization, and a site brief.

Prefer the method first? Read the Geod methodology.