Why Predicting Election Outcomes Doesn't Work - Part 1

Like most of you, I was surprised with the outcome of the election fiasco last night.

But unlike most of you, I wasn't surprised at all that most opinion polls and projections got it dead wrong. That, in fact, didn't surprise me in the least. I was only surprised that so many millions of fellow Americans could exercise what I see as horribly poor judgment.

In fact, I had always been of the minority opinion that US election outcome predictions are crap. All those polls and models, I never really put any trust in them. They just don't work. And while last night isn't necessarily proof that I'm right, it sort of gave me a good excuse to speak up and share the logic behind my view.

Unable to swallow last night's blunder, and feeling helpless to swerve the carriage of America away from the bottomless abyss it's about to plunge into, I decided to give up on politics and try instead to preach a topic where I actually know what I'm talking about: Data science and the statistical myths that surround the modern electoral process here in the US.

So like I said, I wasn't expecting Hillary to win based on any of those sciency-sounding pollsters. Their predictions didn't sway my expectations one way or another. I was watching ballot counts come in with no prior expectations, but with hope that most Americans might have the common sense to realize that in this election one candidate is a less horrible choice by a far margin. My surprise came when tens of millions of Americans gave my common sense expectations the collective middle finger...

But why did I not bother with those "scientifically sound" predictions backed by the most respectable media outlets and vetted by sophisticated statisticians in preppy suits and smart glasses? Because they're based on faulty logic.

Today I'm going to talk about the falsehood of projecting trends from past elections onto future elections. I'll leave the myth of opinion polling for a dedicated Part 2 article, so here goes:

In elections, the past isn't a good predictor of the future


During this election, I lost count of the number of times somebody threw a prediction at me based on a past trend like:
  • "The young Latino population in Florida has always voted Democrat" or 
  • "No Republican has ever won an election without winning Ohio" or
  • "California will of course vote Democrat as usual"
This might come as a surprise to you, but I didn't even take California's vote for granted. I was full of hope that fellow Californians (the way I know them to be) would not side with somebody like Trump. And to my relief, they didn't. But the point is that there was no real relevant historical precedent to how the Californian electorate of 2016 would feel about a Donald Trump, and so I was warranted in feeling anxious watching an important social experiment unfold for the first time. Any sense of security about the outcome based on the past would have been unwarranted.

It was amusing, in contrast, to watch CNN science-men flip a giant touch screen between 2012 and 2016 and insist that comparing ballot counts on a county-by-county basis is a good way to forecast outcome this time around. And they kept loyal to this method even as the night progressed to show all of their projections to be dead wrong!

And that's the All Mighty CNN...

People, this election is happening for the FIRST TIME. Those candidates are competing for your vote for the first time. There are no relevant past trends because this event is unprecedented. Never in the history of mankind has Hillary Clinton run against Donald Trump!

True, there are underlying demographic trends that contribute to the vote, but the effect of those trends is going to be negligible compared to the effect of conscious public opinion. You see, assuming your fellow American voters are not idiots, nor  programmable machines, they are going to base their votes on:
  • The actual candidates
  • The current state of the nation
  • Recent national and international events
  • Personal interests related to their everyday lives
And as every single item on this list will have changed since previous elections, and will have changed in ways that are too complex for your favorite statistician to model, it follows that past trends are not good predictors of the future in this domain.

So next time you feel like using past elections as basis for your prediction of the next one, consider the following:
  • Since the last election, the people running have changed!
  • Since the last election, and precisely because of the outcome of the last election, the political atmosphere has changed
  • Since the last election, events have happened that matter
  • Since the last election, social norms have shifted, as did the collective American world view
  • Since the last election, people have aged!
This last point there is so obvious, yet so often overlooked that I'm compelled to spell it out: Since the last election, time has passed. Older voters have died,  new human beings have entered the electorate pool, and a sizeable chunk of immigrants (myself included) were given the privilege to vote. If the last election was held last week, then one might be forgiven to project age-group demographic trends into this current election. But over 4-year cycles, demographic trends just don't hold.

That is to say, since last election, your 18-21 "demographic" has been completely replaced by a new 18-21 cohort about whom you know nothing. Your 25-35 demographic has been polluted by an influx of people who are very different than any "25-35 year olds" that have ever lived.

I ask you to pause and think about this for a moment. Today's 25-35ers are humans who have come to maturity in the age of Internet. This planet has never before seen adults of that sort. Think about how differently the age of Internet would have shaped their outlook on the world, on America, on race and gender issues. Do you still think you can use the 25-35ers of the time of Bush Sr. as a good model upon which to predict the choices of today's 25-35ers?

What goes for age demographics trends also goes for gender, racial, geo, and socio-economic demographics. But you get the point so I won't hammer it more.

Projecting past trends is a powerful data science tool when you're looking at cyclical phenomena, like the weather, or seasonal shopping demand. It also works well on systems governed by natural law, like earthquakes, or Roulette. The problem happens when highly educated mathematicians decide to apply it to the domain of human behavior at the scale of US elections, which spans hundreds of millions of people and stretches over decades of time.

This hopefully explains why I wouldn't take even California's vote for granted. You see, there is no such thing as a blue state. Only a state that has always voted blue in the past. And the difference is significant, because a state's electorate is constantly changing, and what constitutes the color 'blue' on the ballot is ever-changing as well.

If you don't believe me, put my logic to the test. I challenge you to look at the US electoral map across the past 20 election cycles and produce any system that can predict a state's color for a cycle based on its color for the last few cycles in a reliable way!

In fact, arguing that a state will vote Democrat (or Republican) because it always did before is a lot like arguing that a chess player will not play his knight on the next move because he never moved his knight since the start of the game. The player will readily move the knight, of course, whenever the situation calls for it, with full regard to the current board situation and no regard to past move trends.

So next time somebody starts throwing election predictions at you based on how past elections played out, make sure to stop them and point out that elections, unlike lunar eclipses, are not predictable that way. Remind them that it really is up to the people to decide an election, which they will on Nov 8, and that since the people haven't decided yet, it is silly to to pretend to know the outcome with any certainty.

I leave you with that thought for now, and if I can muster the energy I will be writing about the myth of opinion polling next time around.


Comments