Updating our analysis of the 2016 U.S. Presidential Election for 2020, just 2.5 weeks before the November 3 election, we see several similarities and differences.
Let’s start with the differences: the traditional and the probabilistic polls were miles apart in 2016, with the former predicting a strong Democrat win, and the latter a likely Republican win. In the four months before the 2016, these polls also reacted differently to key events, as detailed in our paper and blog.
In contrast, both the traditional and the probabilistic polls are more strongly correlated in 2020 (0.74) and this time around also the probabilistic poll predicts a likely Democrat win, outside the error/confidence band – in grey in below chart:
As to the similarities, false news is still a dominant factor driving these polls. First, the sharing of false news linked has increased fourfold as compared to 2016. Second, our analysis again shows that a (standard deviation) increase in false news has a strong impact on the poll gap between Biden and Trump. Using Vector Autoregression analysis, which earned Chris Simms the 2011 Nobel Prize in Economics, we again narrowed down hundred of potential drivers into a handful that are leading KPIs of the polls. Based on the model estimates, we calculate (1) the poll gap impact of a change to each driver (a standard deviation change to compare apples to apples), and (2) to what extent the poll gap is explained by all past changes in each driver (the dynamic explanatory power). First, the significant impact is shown in below chart: it is stronger for false news than it is for the candidates’ own actions such as social media posts and advertising – and more important than the size of the candidates’ following on Facebook, Twitter, Instagram and YouTube:
The topic of false news matters, just as in 2016, but these topics have changed. Interestingly, the traditional polls are mostly affected by false news about Black Lives Matter ; while the probabilistic polls are mostly affected by false news about Biden’s Debate preparation and performance.
Both polls are affected by false news about COVID-19; which increases in importance over time. The below graph shows the % of the poll gap explained by each driver from the day the driver changed (day 1 in the chart) till up to 21 days later:
Besides False News on COVID, the importance of the candidates’ social media following also increases over time. While the probabilistic poll gap is mostly driven by growth in Trump’s Twitter following, the traditional poll gap is mostly driven by growth in Trump’s Facebook following and Biden’s Instagram following.
Perhaps more important than the false news that DOES catch on, several false news items have no impact on the polls: the alleged email leaks of Biden and his son Hunter, defunding the police, voter fraud, etc. False news about Biden and fraud is actually associated with higher poll numbers for the candidate, just as the false news about Clinton planning to introduce Sharia law in 2016. Thus, it is not the amount of false news, it is the topic that matters.
Where does this leave us for our prediction for the 2020 US presidential election? Both types of polls favor Biden, and while the prediction markets are more split (see above for today’s Predictit, 10/22) , they got it spectacularly wrong in 2016 (when they went 82% for Hillary Clinton). Still, the 2016 game changed in the last 2 weeks, when especially Hispanic and Asian Americans turned away from the Democratic candidate, as her emails were front and center in both Comey’s Letter to Congress and in fake news. In 2020, these demographics are again key to election, as Trump scores well with e.g. Hispanic men in Florida. However, ‘but his emails’ appears unlikely to drive this preference. Despite the Biden email leak 3 weeks before the election, we don’t see the topic impacting the polls. We predict a Biden win in November.