Why Were the Polls Wrong?

My models were wrong because the polls were wrong (polls are the data used to create models). I was wrong in four states: Florida, Wisconsin, Pennsylvania, and Michigan (Michigan is still too close to call but Trump holds a small lead). In those four states more than 23 million votes were cast and the deciding margin was less than 230,000 votes (less than 1%). Wisconsin was probably the most surprising state. Trump campaigned there a few times (the Clinton campaign did not) and it was the only state in the country where he outspent Clinton on TV ads. The Trump campaign obviously did a better job there. With both campaigns spending the waning days in Pennsylvania and Michigan it seemed the race there may be tighter than the polls indicated. Not one poll all year had Trump ahead in Wisconsin and only one poll had Trump ahead in Michigan and Pennsylvania the entire year.

I gave Trump a 35% chance of winning (higher than everyone else including 538) and I even said if he wins it is highly probable that Clinton would win the popular vote (and that appears to be the case). Why did I give Trump such a high chance of winning? It became apparent that many of polls may be wrong. For instance, the average of polls had Trump ahead in Nevada. By reviewing the early vote data it looked like Clinton should win the state and she did by about 2.5%. The early vote also looked like Trump may outperform the polls in Georgia, Arizona, North Carolina, Iowa, and Ohio. In fact, he outperformed the polls in Iowa and Ohio by well over 5 points. Because of this, I thought Trump could outperform the polls and hence my model had him with a better chance of him winning. Unfortunately, the amount of data provided by early voting is minimal and only a few states provide Party ID and or Demographics.

Polls are created by taking data on phone calls and then weighting the data based on what the pollster feels the electorate will look like based on location, Party ID, age, gender, and ethnicity. For example, let’s look at Florida. A pollster knows from the previous presidential election what percentage of the vote is coming from the Panhandle, NE, I4 Corridor, SE, SW, and North Central part of the state. They also know what percentage of Democrats, Republicans, and Independents vote in the state as well as the gender and ethnic makeup of the electorate. They will weight their polling data to fit the electorate. A pollster may change some things such as understanding that for instance the Latino vote may be a percent higher than the previous election. Polls generally have an error around +/- 4%. So a poll saying Clinton would win by 2 points means she may win by 6 points or lose by 2 points. Hence, there is a big margin and that is why when modeling we like to take the average of several polls to average out some of the inaccuracies. The average of polls in Florida were not that bad. They had Trump winning by 0.2% and he won the state by 1.2%. I actually thought Clinton would win the state by a similar margin. Why? The early vote showed a massive turnout and an increased Latino vote. A massive vote usually favors the Democrats (it means their turnout machine got minorities to polls). Trump won the small heavily conservative counties by huge margins – about 10 points better than Romney. I did not see that coming, nor did the Trump campaign who thought the early results from Miami-Dade, Broward, and Palm Springs counties were going to doom them.

So why where so many average of polls not only wrong, they were wrong in favor of Clinton? Here are some polls and their error:

Florida: R+1, Iowa: R+6.6, Ohio: R+5.5, Nevada: D+3.4, North Carolina: R+3.5, Pennsylvania: R+3.1, Michigan: R+3.7, Wisconsin R+7.5, and Minnesota R+7.6. States like Virginia, Maine, New Hampshire, Colorado, and Arizona were polled fairly accurately. So other than Nevada and Arizona, states that Clinton won were polled accurately and states that Trump won were polled inaccurately.

Time will tell why this happened, but there are many explanations. First, polls may have oversampled Democrats thinking the turnout would be as big as Obama elections in 2008 and 2012. Democrats outpaced Republicans by 7% in 2008 and 6% in 2012. Exit polls suggest that Democrats only had a 4% advantage in this election. Also, polls consistently showed Trump winning 85% of Republican voters and Clinton winning at least 90% of Democrats. Exit polls showed that Trump won 90% of Republicans and Clinton won 89% of Democrats. Second, third party candidates garnered over 5% in most states and that makes polling more difficult. Third, it is possible that polling companies felt there was no way that a person as un-presidential and unfavorable as Trump could win the election and they may have altered their models to show this. In essence, the polls showed what the media wanted. Fourth, it seems that Trump won a higher percentage of working class Independents and or Democrats than what polls indicated. This was especially true in the “rust belt” states. These may be union workers who did not want to admit to pollsters they were voting for Trump. Fifth, Trump not only won non-educated Whites by over 30%, he won educated Whites by 5%. Most polls indicated he was losing educated Whites. It is possible that educated Whites could not admit to pollsters they were voting for Trump for fear of being labeled a racist. Sixth, Trump won a higher percentage on African-Americans and Latinos than Romney and these people possibly did not want to admit to pollsters that were for Trump. Seventh, polls cannot model enthusiasm. But a model can take into account that a candidate is outmanned and outspent by a 3 to 1 ratio. Finally, late deciders broke for Trump according to exit polls.

Interesting, many exit polls (the polling of people right after they vote) were also wrong. Why did it take so long for media outlets to call the state of South Carolina or Utah? Trump won these states by 15 and 20 points respectively? Nevada and Colorado which Clinton won by 2.5% were called far faster. What this tells me is that exit polls in many cases were also wrong.

