Voting matters, in theory

People vote for a variety of reasons: as a civic duty, as a social signal, as support for an ideology, etc. But the most fundamental reason to vote is to win elections.

In 2016, my brother told me that voting (in the perennial swing state of Pennsylvania) wasn't worth his time because one vote was so unlikely to change the election outcome. I don't think this argument holds water for even mildly altrustic people. 80,000 Hours analyzed of the efficacy of one vote, drawing on empirical studies that claim that a voter in a swing state has around a 1 in 10 million chance of casting the sole deciding vote in a US presidential election. The problem is that the arguments in these studies can be hard to parse and easy to ignore for reluctant voters like my brother.

I want to present a simple theoretical analysis that shows that voting matters a great deal in swing states. The crux of the analysis is this asymptotic claim:

Claim: In a two-party election with n voters with polling mean μ and polling variance σ², if we set k = |0.5 - μ|/σ, then the probability that one vote is decisive is θ(e^-k²/2/(σn)). In particular, if σ and k are constant, the probability is θ(1/n).

This claim is not new; see Margolis (1977): The Probability of a Tie Election for a derivation including constant factors. But I think that presenting a simple approach to prove this result is valuable for both illustrative and rhetorical purposes.

In the rest of this analysis, I motivate our interest this claim, lay out my assumptions and prove the claim, and then discuss how valid those assumptions are.

Motivation

The outcome of an election affects the entire governed population. In general, this population is a superset of voters. In the US, most of the population are not voters: children, felons, and non-citizens cannot vote, and most eligible adults choose not to. Turnouts are around 55% for presidential elections and around 48% for midterms.

To get a lower bound for election impacts, we can consider laws like the Affordable Care Act and the 2017 tax cuts, neither of which would have passed if the the opposition held power. These laws reallocated hundreds of billions of dollars each and affected most of the country. They show that an election effect size of order θ(n) is plausible. Combined with our claim above, we see that the social impact of voting is independent of population size - if k is constant, θ(n/n) = θ(1) - and is boosted by low turnout.

The claimed exponential dependence on k explains the colloquial distinction between "swing states" and other states. Even when federal races are uncompetitive, though, the ballot may include state or local races for which a vote has high social impact. As for primaries, when one party is dominant, impacts can still be high if its candidates have sufficiently different positions, but in closer races, ther chances of winning are critical.

Proof

Assume the following heirarchical model the results of the election:

Draw a probability p from a normal distribution with mean μ and variance σ².
Draw a vote count for one party from a normal distribution with mean np and variance θ(n). They win if this count exceeds n/2.

The key idea of our proof is to find an interval [0.5 - ε, 0.5 + ε] such that if p falls into this interval, then there is a substantial chance of an evenly-split vote.

We can analyze the vote count from step 2 by modeling it as a sum of n independent, identically distributed Bernoulli random variables of mean p. Their sum has mean np and variance np(1-p), which is θ(n) for p in our interval. We may have to scale this sum around np to match the variance from our original model, but doing so only changes ε by a constant factor. From now on, we assume this Bernoulli decomposition.

If p is exactly 0.5, then the probability that a vote is decisive - that is, the probability of an even split - is (n C n/2)/2ⁿ. By Stirling's approximation, this expression is θ(1/√n).

How large can ε get such that this asymptotic bound still holds for p in [0.5 - ε, 0.5 + ε]? The probability of an even split when p = 0.5 + ε is:

	(n C n/2) · (0.5 + ε)^n/2(0.5 - ε)^n/2
=	(n C n/2) · (0.25 - ε²)^n/2
=	(n C n/2)/2ⁿ · (1 - 4ε²)^n/2

As long as ε ≤ 1/√n, the term on the right is θ(1) and the whole expression is θ(1/√n). The probability that p falls into this narrow range is roughly N(0, σ)(k) · 2/√n, since for constant σ the probability density is roughly equal over the entire interval. Multiplying by θ(1/√n) and expanding the normal distribution's density gives us our claim.

Validity

Assigning a probability to an election outcome is really about defining where our uncertainty about the event comes from. The two-stage heirarchical model from the proof above accounts for two kinds of uncertainty that we should have:

The first stage reflects our uncertainty about the electorate's aggregate behavior.
The second stage says that even if we understand this aggregate behavior, we should still be unsure about the election-day behavior of individual voters.

The specific distribution in the first stage matters little; we only need it to maintain a relatively high density over the interval [0.5 - ε, 0.5 + ε]. We should justify the normal-distribution assumption and the order-of-magnitude variance in the second stage.

Polling provides us critical data about the electorate in aggregate, but even carefully-constructed polls with good sample sizes have error bars around ±2-4%. These error bars are with respect to the population being sampled, not the election result.

We might try to reduce polling uncertainty by taking an average across polls. Doing so is not trivial. In the US, most pollsters have an empirically-detectable left- or right-leaning bias. Different polling methodologies also introduce bias, like using landlines (more likely to include older voters) or polling all registered voters instead of likely voters (more likely to include younger voters).

When we talk about the polling mean and polling variance, then, we're talking about a statistical model that weights polls and corrects for these effects, such as the ones by FiveThirtyEight. And even these models have uncertainty around ±2-4% with respect to the election result, justifying the first type of uncertainty.

Where does the second type of uncertainty come from? The proof above considers an electorate composed of identical voters who each vote randomly on the day of the election. In truth, most voters make their decisions far in advance and different segments of the electorate behave very differently. However, there are many independent sources of voter-level randomness that can affect an election result:

Some voters may be undecided about their vote, deciding based on recent news.
Some voters may choose not to vote, deciding based on time or convenience.
Some voters may be unable to vote, due to a busy schedule at work, a lack of transportation, or various procedural measures to discourage turnout.

Even if we have a highly structured understanding of the electorate - for example, a breakdown into (say) 45% Republicans, 45% Democrats, and 10% swing voters - these groups will still be of size θ(n) each, and the overall vote counts from each group will be normally distributed with variance θ(n), justifying the second type of uncertainty.

Finally, we should consider whether the asymptotic analysis above loses too many constant factors to be of practical use. The easiest way to do that is to note where we gain and lose these factors and their rough order of magnitude:

There's a factor of 1/√(2π) from Stirling's approximation: -0.5 magnitudes.
There's a factor of 1/e² from substituting for ε in the term above: -1 magnitude.
There's a factor of 1/√(2πσ²) in the density function. If σ ~ 2-4%: +1 magnitude.
High turnout is still just a third of the governed population: +0.5 magnitudes.

These corrections roughly cancel out. Thus, 1/n is a good estimate for the chance that a single vote changes the outcome of a race with a polling mean around 46-54% and polling standard deviation around 2-4%. According to the result from Margolis's paper, that's actually a slight underestimate of the probability, but at that point other corrections (like accounting for the structure of the electorate) start to matter as well.

From the 80,000 Hours post, various empirical analyses and close historical elections provide further evidence that these constant factors are good.

Summary

This analysis suggests for an altrustic voter (that is, a voter who cares about outcomes for everyone in their city, state, and country), voting is roughly equally effective at all levels of government. Since elections cause the reallocation of thousands of dollars per voter, a 1/n chance of changing the outcome implies high social impact. That's similar to the result of the 80,000 Hours post, even though we arrived at it in a different way.

If you live in a swing state, vote! If any other races on your ballot are competitive, vote! And even if the winners are all but decided, vote - it's good to develop a habit, if only to prepare for elections where the impact of voting is high.