Why Predicting Election Outcomes Doesn't Work - Part 2

Building on my blog from last week, and before I move on to debunk the myth of reliable opinion polling, I want to spend more time on the futility of basing election predictions on the outcome of past elections, even at the state level.

This logical fallacy is so deeply entrenched in the collective modern mindset that I've been called ridiculous just for challenging it. To me that means it merits a deeper dive to unravel.

Let's start with a thought experiment where prediction based on past trends DOES work. Imagine you are rolling a loaded dice. Unknown to you, it happens to be rigged to land on 6 every time. As you roll the dice over and over again, your mind takes you through the following line of reasoning:

  • I got a 6 the 1st time I rolled the dice
  • I got a 6 the 2nd time I rolled the same dice
  • I got a 6 the 3rd time I rolled the same dice
  • ... 
  • I got a 6 the 100th time I rolled the same dice
  • Therefore this dice is loaded and will always roll a 6
  • Therefore I'll get a 6 next time I roll this dice

This logic is solid. And your prediction will most likely prove correct. In fact even if the dice wasn't perfectly set to land on 6 all the time, you can still say something meaningful about the probability of it landing on 6 in a future roll based on past trends. This sort of mental exercise is called forecasting, and data science geeks happily apply it all the time with proven track records of success.

But note how this forecasting business relies entirely on your ability to repeat the same experiment multiple times and keep a tally on the outcomes. In this case you grabbed the same dice and rolled it again and again on the same table. Between experiment runs, the subject was the same, the environment was the same, and the agent (you) was one and the same.

Let's now look at the other extreme. Consider your friend applying the same roll-the-dice logic as follows:
  • America won back in the American Revolution
  • America won back in the war of 1812
  • America won back in the Mexican War
  • America won back in the Spanish-American War
  • America won back in WWI
  • America won back in WWII
  • Therefore America always wins at war
  • Therefore America will win the war in Vietnam

I think everybody might have a gut feel of how wrong this reasoning is. And in any case America lost in Vietnam. The interesting question is, why is this line of reasoning wrong even though it seems to be identical to the flawless roll-the-dice line of reasoning above?

The problem is, unlike rolling a dice time after time, going to war is not a repeatable experiment. While it is possible to roll the same loaded dice twice, it is impossible to wage the same war twice. Specifically, on a time scale wide enough to encompass multiple experiments (in this case multiple war events), the subject of the experiment (in this case America) is continually transforming. The context of the experiment is also never the same. Every war is waged in a different geographic zone, against different enemies, along with different allies, using different weapons, for different goals, and by a different generation of fighters.

So multiple wars is not like multiple rolls of a dice, but more like rolling a different dice each time, with no prior knowledge of how each dice might be loaded. In this situation, it is impossible to use trends from past rolls to predict the outcome of the next roll.

The logical fallacy central to my argument can now be expressed as follows:

  • California voted Democrat in 1992
  • California voted Democrat in 1996
  • California voted Democrat in 2000
  •  ...
  • California voted democrat in 2012
  • Therefore California is a Democrat state
  • Therefore California will vote Democrat in 2016

In fact, elections are of the unpredictable, 'war' category, not the 'roll-the-dice' category. Whatever probability you think applies to California voting Democrat in 2016, it is not possible to validate your probability estimate because the experiment can only happen once. California's vote in 2020 is not analogous to another roll of the same dice, it is analogous to a roll of a different dice, the odds of which are independent of the 2016-election-dice.

If you still want to slap a prior probability on California's vote in 2016, go ahead and do it. Just please don't base it on the outcome of prior elections, but rather on the current landscape in 2016: the candidates, the current electorate of California, the current political atmosphere, etc.

And if you still don't buy this wholesale argument - suit yourself. Just remember my words when some time in the future, California happens to vote Republican. As the world of mathematical modelers in suits and glasses reel in shock, I will be sitting in some coffee shop completely unfazed. And though you won't hear it from me, I will be thinking to myself: 'told ya!'

Comments