Why Predicting Election Outcomes Doesn't Work

Building upon my past blog narrative about the futility of predicting US election outcomes with mathematical models, today we will target the futility of a particular prediction method: opinion polling on the weeks before Election Day.

It is my opinion that opinion polling the way it is widely conducted today simply doesn't work. To understand why let's consider a simplified thought experiment:

The curse of under-sampling

Imagine if voters in the US came in only two, visibly distinct and impossible to mistake personas. You have the Reds who have glowing red skin color and always vote Republican, and the Blues who have glowing blue skin color and always vote Democrat. Let's also assume those two personas are so evenly mixed across the vast geography of the US that polling in any neighborhood would project well onto the entire nation. Finally let's assume that we're on a national popular vote system, avoiding for the moment the added complexities of polling for an electoral college system.

In this simple setting, opinion polling reduces to a process of counting Reds and Blues passing by while you're sitting at the front patio of your favorite coffee shop somewhere downtown. Say 30 people pass by. You count 20 Red, and 10 Blue, so you conclude there are more Reds than Blues out there in America and therefore you predict a Republican win in the election next week.

The question is, how confident are you in your prediction? In other words, how likely are you to be wrong?

This sort of confidence measure is in fact mathematically workable. If you ask Mr. Mathematician to do it, you'll first have to suffer through a lecture featuring fancy sounding things like 'central value theorem' and 'normal distribution' and 'standard deviations below the mean' and 'Z table lookup'. Let him talk for a while and he'll finally tell you that in this case the probability of your prediction being wrong given the tally you saw is around 35%.

So, in this simplified world of only two voting personas, and having polled 30 people, and the tallies being 20 Red vs 10 Blue, the probability of your prediction in favor of the Red being wrong is 35%. Realizing you can't live with such high probability of error, you might decide to poll many more passers by in order to get fairly confident about your prediction.

After having polled 1000 people, say you find that the tally is running pretty close at 562 Red to 438 Blue. Your prediction is the same as before that Red is more popular nationwide. But how likely are you to be wrong now?

Using the same formulas as before, Mr. Mathematician would say about 45%.

What? You mean by polling more people, a lot more people, the uncertainty you have in your prediction actually went UP?

Yes, it went up. By polling more people, you are now, as you ought to be, less confident that you can predict the outcome. What's happening here is the the mathematical equivalent of the ancient wisdom"ignorance is bliss". By polling a small number of people, you are likely to get falsely reassuring answers simply due to serendipitous circumstances at the time of polling. Those reassuring answers might make it look like one side is winning by a safe margin, while in fact the race nationwide is tight and hard to call, as only wider polling might expose.

The trouble is, polling scientists are all too comfortable applying the same mathematical formulae to any pool of polling outcomes, faithfully producing confidence measures of the falsely assuring "35%" caliber, and spewing those in your face as valid science.

To better understand how a problem like this should be approached, let's turn the experiment on its head. Let's now suppose the race really is tight as it was for 2016, and try to find out the size of the sample we need to have polled in order to call the race correctly with high confidence.

Putting horrid math aside, computer scientists resort to computer simulations to do the heavy lifting needed to compute the answers. This neat little computer code snippet below will simulate today's divided America being polled with increasing sample sizes. But rather than one poll or ten, it will run millions of polls, each with a different randomly chosen sample of voters, then keep tabs on how often each polling size strategy led to the right prediction on the winner of the national popular vote.

Running this little code snippet overnight on a modern Macbook Pro, you get the following insights:

Simulated a million nation-wide polls of each polling size...
Flipping a coin led to the wrong prediction 50% of the time
Polling 100 random voters led to the wrong prediction 47% of the time
Polling 500 random voters led to the wrong prediction 37% of the time
Polling 1,000 random voters led to the wrong prediction 31% of the time
Polling 5,000 random voters led to the wrong prediction 13% of the time
Polling 10,000 random voters led to the wrong prediction 5% of the time

What this is saying, in a nutshell, is that a poll of some 100 randomly chosen voters has the accuracy of a mighty coin flip! Not surprising, since America happens to be so evenly divided and polling only 100 out of over 120 million voters is not going to expose the small margin of one party over the other.
In fact, according to this simulation, you'll need to have polled a good ten thousand voters before you can use their answers to predict the winner with reasonable confidence. And that's assuming a truly random sample, in a super simplified setting where voters come in only two simple personas and without any political variations with age, gender, geography, or any other demographic!

In contrast to this, your typical poll at a respectable institution, armed with statisticians in suits and glasses is based only on one thousand people or so [source]. This simple simulation shows that a poll of that size even in the simplest of settings will tend to be wrong over 30% of the time! Yet those statisticians will still claim that their numbers have an "sampling error" or "error margin" below 5%. What they mean by those metrics tends to be convoluted and shady at best. In their neat mathematical bubble those numbers make sense, and they do come from the right hand side of a complicated equation often involving probability distributions and statistical confidence measures. But in order to understand how silly their claims are, consider this:

Their sampling error is below 5%, but their ultimate prediction is going to be wrong 3 times out of 10. Go figure :)

The reason your friends at Bloomberg Politics polling department might be totally blind-sighted to that simple logical fallacy, is that they generally don't get to redo their poll 10 times. Every election happens once, and they get to run their poll only once at some point during the weeks preceding the election.

As I run my simulated polls millions of times, I get a reliable sense of their average error rates. But these pollsters will run their poll once before the election, and then they'll either get their prediction validated or falsified on Nov 8. If validated, they will trumpet the win and brag about it for a long while. If falsified, they won't mention it ever again!

In fact, I'm yet to find a polling agency with the sensibility to do a postmortem on their erroneous poll prediction using data from the actual election. It's a valid question to ask: What was wrong with my polling sample such that it led me to the wrong predicted outcome? How many more people should I have polled before my conclusion would have been correct?

'Under-sampling' is the technical term for the problem those poll-happy statisticians choose to ignore. Next time you meet one of them at a cocktail party, I suggest you ask them this:
"If you fill a bucket of water out of the Pacific, and you find it doesn't contain any Blue Whales, would you call up CNN to tell them that the Blue Whale just went extinct?"

Not one but fifty races

The problem is compounded with the introduction of the electoral college system. Instead of one national poll, effectively you'll have to run 50 polls, one per state. Because you'd have no idea how evenly divided the electorate is in every state, you'd have to assume an even divide and end up polling large sample sizes in every state. Remember from my previous blog post that no information from previous elections can be reliably used to estimate the state-by-state division of voters, another common mistake pollsters fall into.

Polls are hopeless under the electoral college system, basically because there are many more way to get it wrong. In order to call the election correctly, you'd have to get not just one poll right, but many separate polls one per state. Get one important state wrong and your entire prediction is shot.

In addition, breaking a total poll sample of say 1000 people across the 50 states proportionately by population means that many states will necessarily be underrepresented, some will only get an allocation of a dozen people if that. The smaller states running a tight race in many cases, you have no hope of calling them correctly.

To prove I'm right, I modified my previous computer simulation of millions of polls with varying sizes. This time it doesn't simulate one national poll, but a prediction based on separate state-by-state polls in an electoral college projection. The number of people polled per state is determined by the the proportion of American voters who live in that state.

I'll save you the program code (which you can read here if you're a fellow geek) and skip to the output:

Simulated 50 million state-wide polls of each nation-wide polling size...
Flipping a coin led to the wrong prediction 50% of the time
Polling 100 random voters led to the wrong prediction 64% of the time
Polling 500 random voters led to the wrong prediction 69% of the time
Polling 1,000 random voters led to the wrong prediction 65% of the time
Polling 5,000 random voters led to the wrong prediction 43% of the time
Polling 7,500 random voters led to the wrong prediction 37% of the time
Polling 10,000 random voters led to the wrong prediction 34% of the time
Polling 25,000 random voters led to the wrong prediction 24% of the time
Polling 50,000 random voters led to the wrong prediction 18% of the time Polling 100,000 random voters led to the wrong prediction 11% of the time

What this is saying is that, given America's electoral college system, predicting the winner by polling any less than 5000 people nationwide is a colossal waste of effort: you'll do better by flipping a coin!

In fact this simulation effectively confirms what common sense dictates:

In a present day America so vast and so polarized, and with so many states locked into tight, consequential races, your nationwide poll is going to have to include 100,000 of people before you can reliably call every one of those state-specific races in order to predict the outcome of the election as a whole.

Of course, your friendly neighborhood pollster will try to mitigate this problem by polling large numbers of people from "purple" states and much fewer people from "red" or "blue" states. This compounds a bad assumption on top of an already flawed polling methodology, rendering the prediction even less reliable. Refer to my previous posts about why treating states as red of blue from historical trends is a big fat fallacy.

Not two but multiple political personas

The number of people you need to poll depends on the number of kinds of people you expect to be out there. So far we've assumed only two political personas, with perfectly uniform characteristics. But America of course is home to more political personas than one can count.

You have the "Latino millenial college girl" for example, and the "White middle-aged highshool degree hipster", not to mention the "Black graduate-degree big-city liberal" or the "Korean grandma first generation immigrant" and the list goes on. Depending on the candidates on the menu for a particular election, each one of those demographics might side with one particular candidate for whatever complex political pressures, however indirect. Unless you want to base your prediction on guesswork, you'd have to make sure you cover each of the biggest demographics in your sample, as well as each of the demographics that might happen to matter in this election, which are anybody's guess.

To recap, that's fifty different polls, one per state, each large enough to cover every demographic that one might expect to matter: old vs young, white vs black vs latino, liberal vs conservative, rich vs poor, educated vs not, blue collar vs white collar, Christian vs Jew vs Buddhist, secular vs religious, and the list goes on and on.

In fact, since the required poll sample size grows exponentially with the number of personas you plan to cover, the problem is compounded to the point where under-sampling becomes a hurdle significant enough to be called a show stopper.

Under-sampling is least of your problems

Of course, under-sampling isn't the only curse plaguing election polls.

Sampling bias is another beast. All of the logic explained above requires a truly randomized sample of voters in order to work properly. But in real life, however method you choose to reach out to people in a poll, you won't get a truly random sample. Rather, you'll get a sample biased towards the types of people who respond well to your method of polling. Poll online, and you'll wind up with a bunch of online-dwellers. Poll by phone, and you'll wind up with a bunch of phone-answering people (I for one wouldn't pick up if I didn't recognize the caller). Even if you poll using each and every method available, you'll still get a bunch of people who tend to respond to polls in the first place, which isn't everybody.

The Heisenberg effect is yet another challenge: With election poll results swarming headline news on every outlet like a modern day plague, those massively publicized predictions can actually affect people's attitudes towards the election to a significant degree. Predict one candidate to win by a safe margin, and lots of people might decide not bother to go vote for her. Predict one candidate to lose by a margin, and you might unintentionally help him build a sizeable base of supporters who subscribe to the 'support the underdog' mentality. If the race if too close, predicting in favor of one candidate might tip the balance by compelling undecided voters to 'get on the winning bandwagon'.

If you're having a hard time imagining the full effect of this problem, imagine how differently students might answer on a multiple choice exam if you were to give them minute by minute updates on how popular every question answer is among the class as a whole.

And finally, dear poll scientist, if all of the above isn't enough, consider the following sad reality: Your predictions are only as good as your subjects' commitment to give you genuine answers. Whichever people you choose will have no reason to take the poll seriously. They might lie, misinform, or just give random answers. Depending on how you conduct the poll, people might not feel comfortable telling you their real choices.

Only in the privacy of a voting booth can you truly be sure people are giving their real choices. That is of course why we go through the trouble of conducting the elections the way we do in the first place, rather than call people up and ask them to vote by phone.

Halim's blog

Search This Blog