What is a likely voter model and why does it matter?

A likely voter (LV) model is the set of criteria a polling firm uses to determine which respondents are included in results as likely to actually vote. Different firms use different methods: some ask respondents to rate their own likelihood of voting, others use a combination of past voting history, enthusiasm, registration status and interest in the election. Because presidential elections typically see 60-65% turnout, excluding roughly one-third of registered voters from the final sample is a major decision that significantly affects poll results. A poor likely voter model is one of the most common sources of polling error.

How accurate were polls in 2020 and 2024?

2020 polls were among the most inaccurate in modern history. They overestimated Joe Biden's margin by an average of 3.9 percentage points nationally and by even more in key states. The American Association for Public Opinion Research found the systematic undercount of Republican-leaning voters persisted and worsened from 2016. In 2024, polls performed somewhat better, correctly showing a tight race, but still showed a slight Democratic lean — underestimating Donald Trump's winning margins. The three-cycle pattern of underestimating Republican support remains the central unresolved challenge in US election polling.

What is weighting in polling?

Weighting is the process of adjusting poll results to match the known demographic composition of the electorate or population. If a poll's raw respondents are 60% women but women make up 52% of likely voters, the pollster mathematically upweights male respondents and downweights female respondents until the sample matches the target composition. Pollsters weight by many variables simultaneously — age, race, education, geography and sometimes past vote. The challenge is deciding what the correct target population looks like, particularly for a future election where turnout is uncertain.

EXPLAINER — POLLING

How Presidential Polls Work

Q: What is a margin of error in polling?

The margin of error (MOE) is a statistical range that reflects the uncertainty in a poll estimate caused by sampling a small subset of the full population. A poll showing Candidate A at 48% with a ±3% margin of error means the true level of support is likely somewhere between 45% and 51%. Margins of error are typically calculated at the 95% confidence level, meaning if you ran the same poll 100 times, 95 of those polls would produce a result within the stated margin of error. Importantly, the MOE only captures random sampling error — not systematic errors like biased samples, poor weighting or wrong turnout models.

Q: Why were the 2016 polls wrong?

National polls in 2016 were actually fairly accurate — they showed Hillary Clinton leading by about 3 percentage points nationally, and she won the popular vote by 2.1 points. The failure was at the state level. State polls significantly underestimated Donald Trump's support in Michigan, Wisconsin and Pennsylvania, largely due to under-representation of non-college-educated white voters — a group that broke strongly for Trump. Many state polls also closed too early, missing late-breaking shifts toward Trump after FBI Director James Comey's letter about Clinton's emails 11 days before the election.

A presidential poll interviews a few hundred or few thousand people — then claims to represent the views of hundreds of millions. How is that possible? And why do polls keep missing? Here is a complete, jargon-free guide to polling methodology: how samples are built, how results are weighted, what margin of error actually means, and what went wrong in 2016, 2020 and 2024.

Home › Explainers › Polling Methodology

~800

Typical sample size for a national presidential poll

±3.5%

Typical margin of error for an 800-respondent poll

3.9pts

Average 2020 national poll error (overestimated Biden)

95%

Standard confidence level used in reporting margin of error

Step 1: Building a Sample

Every poll begins with a sample — a subset of people intended to represent a larger population. For a presidential poll, the target population is typically all likely voters in the United States, roughly 130-150 million people in a presidential election year. A pollster cannot interview all of them, so they interview a manageable number — typically between 600 and 1,500 respondents — and use statistical theory to generalize the results.

The key principle is random sampling: every person in the target population should, in theory, have an equal chance of being selected. In practice, this is extremely difficult. The era of random-digit telephone dialing — when nearly everyone had a listed landline — made true random sampling feasible. Today, with declining landline ownership, caller ID avoidance, low response rates (often under 5%) and a shift to online panel polling, the "random" in random sampling is increasingly theoretical.

Most modern polls draw from one of several sources: registered voter lists (which are public records), commercial consumer databases, opt-in online panels (where people have signed up to take surveys), or text message outreach. Each method has biases. Online panels oversample the internet-engaged. Registered voter lists miss newly registered voters. Phone polls underrepresent people who do not answer unknown numbers. Managing and correcting for these biases is the core challenge of modern polling.

Step 2: Likely Voter Models — Who Actually Votes?

Registering to vote and actually voting are two different things. In presidential elections, roughly 60-65% of eligible adults vote — which means any sample drawn from registered voters or all adults includes millions of people who will not cast a ballot. A poll that includes non-voters in its results will produce meaningfully different numbers than one that filters them out.

Likely voter (LV) models are the set of criteria pollsters use to determine which respondents to count in their reported results. The two most common approaches are:

Self-reported likelihood: Simply ask respondents "How likely are you to vote in the upcoming election?" and include only those who say they are "very likely" or "certain to vote." This is simple but susceptible to social desirability bias — people who say they will vote but do not.
Composite scoring: Score respondents on multiple factors — past voting history (derived from public voter files), enthusiasm, interest, registration duration and stated intention — and include only those above a threshold score. Gallup pioneered this method with a seven-question scale. It is more predictive but more expensive.

The choice of likely voter model can shift a poll result by 2-4 percentage points. A pollster who expects high Republican turnout will build a more Republican-leaning likely voter screen; one expecting Democratic enthusiasm will tilt Democratic. This is one reason different pollsters show different results even when interviewing at the same time.

Step 3: Weighting — Correcting the Sample

Even with a well-designed sample, the people who respond to a poll are never perfectly representative of the electorate. Certain groups — older people, more educated people, politically engaged people — are more likely to respond to polls than others. If a poll's raw results include 65% college-educated respondents but college graduates make up only 38% of the electorate, the results are biased.

Weighting corrects for this by mathematically adjusting how much each respondent's answers count. A respondent from an underrepresented group gets a weight greater than 1 (counts more); a respondent from an overrepresented group gets a weight less than 1 (counts less). Pollsters typically weight simultaneously on age, gender, race, education, geographic region and party registration.

The critical — and controversial — decision is what the "correct" target looks like. For a registered voter poll, the target can be the actual registered voter population. But for a likely voter poll, the pollster must estimate what the electorate will look like on Election Day — a future event with inherent uncertainty. If a pollster assumes the electorate will look like 2020 but it ends up looking like 2016 (with higher white working-class turnout), their weighting targets will be wrong and their results will be systematically biased. This is the proximate cause of most major polling errors.

Margin of Error: What It Does and Does Not Mean

The margin of error (MOE) is the most commonly cited and most commonly misunderstood statistic in polling. It is calculated from sample size using standard statistical formulas: a poll of 800 respondents produces a margin of error of roughly ±3.5%; a poll of 1,600 respondents produces a margin of roughly ±2.5%. Quadrupling the sample size only halves the margin of error — which is why large-sample polls are expensive but not proportionally more accurate.

At a 95% confidence level, a stated margin of error means: if you ran the identical poll 100 times with different random samples, 95 of those polls would produce a result within the stated range. A candidate polling at 47% with a ±3% MOE is actually anywhere from 44% to 50% with 95% confidence.

What the margin of error does not capture: systematic error. The MOE only quantifies random sampling variation — the luck of which specific people happened to answer the phone. It says nothing about whether the sample was unrepresentative, whether the likely voter model was wrong, whether certain groups were underweighted, or whether the question wording pushed respondents toward a particular answer. These non-random errors — which are far more consequential in modern polling — are not reflected in the margin of error at all. This is why polls can show systematic errors of 5-7 points in one direction across an entire cycle while reporting a margin of error of ±3%.

Why Polls Missed: 2016, 2020 and 2024

2016: State-Level Failure, National Accuracy

National polls in 2016 showed Clinton +3; she won the popular vote by 2.1 points. The failure was entirely at the state level. Polls in Michigan, Wisconsin and Pennsylvania underestimated Trump by 5-7 points. The primary cause: under-weighting of non-college-educated white voters, who had not been a significant demographic variable in previous cycles and were not included in weighting targets. A secondary factor was late movement: Trump gained 2-3 points in the final two weeks after FBI Director Comey's letter, which most polls released before that shift missed.

2020: The Worst Polling Cycle in Decades

Biden was projected to win by 8-10 points nationally; he won by 4.5 points. State-level errors were larger — polls overestimated Biden by 7-8 points in Iowa, Ohio and Florida, which Trump won comfortably. The American Association for Public Opinion Research's (AAPOR) investigation found the error was systematic and affected nearly all pollsters simultaneously. The leading theory: partisan non-response bias — Trump voters were disproportionately unlikely to participate in polls, and this gap widened in 2020 as Trump consistently framed polls as "fake." No single weighting adjustment fully corrects for this if the underlying sample is already missing a certain type of voter.

2024: Closer, But the Lean Persisted

National polls in 2024 correctly showed a very close race — most final averages had Trump and Harris within 1-2 points. Trump won by approximately 1.5 points nationally, so national polls were more accurate than 2020. However, polls again showed a slight Democratic lean in most battleground states: Harris led in Pennsylvania by 1-2 points in late polls; Trump won it by 2 points. The improvement was real but the directional bias — consistently underestimating Republican support — persisted for a third consecutive cycle. Many pollsters attempted to correct for 2020 errors by adjusting their likely voter screens; some overcorrected, some undercorrected.

Presidential Polling Accuracy: National Average Error by Cycle

Cycle	Final Poll Average	Actual Result	Error	Direction
2012	Obama +1.4	Obama +3.9	2.5 pts	Underestimated Obama
2016	Clinton +3.1	Clinton +2.1	1.0 pts (national)	Slight Clinton overcount; large state errors
2020	Biden +8.4	Biden +4.5	3.9 pts	Systematically overestimated Biden
2024	Trump +0.3 (approx.)	Trump +1.5	~1.2 pts	Slight Harris overcount; state errors persisted

Types of Polls: Phone, Online and IVR

Live Telephone Interviews (CATI)

The traditional gold standard. A live interviewer calls respondents (including cell phones, required by FCC rules for probability samples) and records their answers. Response rates have fallen from 35-40% in the 1990s to under 6% today. More expensive but generally higher quality; better at reaching older and less digitally engaged voters. Firms like NYT/Siena, ABC/Washington Post and Marist use live phone polling.

Online Panel Polls

Respondents are recruited from large opt-in panels (people who have agreed to take surveys in exchange for incentives). Cheaper and faster than phone polling, but non-probability based — the sample is not drawn randomly from the general population. Quality varies widely depending on how the panel is recruited and how responses are weighted. YouGov, Ipsos and SurveyMonkey use online panel methods. These polls tend to oversample politically engaged and highly educated respondents.

IVR / Automated Polls ("Robopolls")

Interactive Voice Response polls use pre-recorded questions and respondents press buttons to answer. By law, robopolls cannot call cell phones — limiting them to landline-only samples. Robopolls are cheap, fast and produce large samples, but their cell phone exclusion creates a structural bias toward older, more rural respondents. Emerson, Rasmussen and PPP use robopolling for at least part of their samples.

Frequently Asked Questions

What is a margin of error in polling?

The margin of error reflects the range of uncertainty caused by sampling a subset of the population rather than interviewing everyone. A poll showing 48% support with a ±3% MOE means the true level is likely between 45% and 51%. Critically, the MOE only captures random sampling error — it does not reflect systematic errors from unrepresentative samples, bad likely voter models or poor weighting, which are the actual causes of most major polling failures.

What is a likely voter model?

A likely voter model is the set of criteria a pollster uses to determine which respondents count in their reported results. Because only 60-65% of eligible adults vote in presidential elections, pollsters filter out likely non-voters. Different firms use different criteria — self-reported likelihood to vote, past voting history from public voter files, enthusiasm scores and registration duration. The choice of likely voter model can shift results by 2-4 points and is one of the most consequential methodological decisions a pollster makes.

Why were the 2016 polls wrong?

National 2016 polls were mostly accurate — Clinton led by 3 points in averages and won the popular vote by 2.1 points. The problem was at the state level: polls in Michigan, Wisconsin and Pennsylvania missed by 5-7 points because they under-weighted non-college-educated white voters, who broke heavily for Trump. Many state polls also closed before the final two weeks, missing a late Trump surge after the Comey letter. The national vs. state divergence is why Clinton lost the Electoral College while winning the popular vote.

What is weighting in a poll?

Weighting adjusts for the fact that certain groups respond to polls at different rates than they exist in the electorate. If a poll's raw sample is 60% college graduates but only 38% of voters are college graduates, each college-graduate respondent counts for less than 1 and each non-college respondent counts for more than 1 in the final results. Pollsters weight simultaneously on age, race, gender, education, geography and sometimes past vote choice. Getting the weighting targets right — especially for a future election where turnout is uncertain — is the core methodological challenge.

More to Explore

Explainers