By Daniel Egger, Director – Center for Quantitative Modeling, Master of Engineering Management Program, Pratt School of Engineering

In the week leading up to Tuesday’s presidential election, Nate Silver and his well-known political forecasting web site www.fivethirtyeight.com received harsh public criticism for assigning to Donald Trump a much higher winning probability than other similar sites.

If I recall correctly, this contrast peaked around Thursday, November 3, when Silver gave Hillary Clinton “only” a 65% probability of winning, while other mainstream projections all assigned her winning probabilities greater than 90%. Silver tried to defend himself by saying, in effect, that his expected vote percentages for Clinton were really not so different than those of other forecasters, but that he assumed higher variance around those numbers, so that her November 3 projected margin of victory (3-4%) was within a margin of error.

In hindsight, all forecasts, including Silver’s, based on third-party polling were quite wrong about the vote percentages, although Clinton still “won” the popular vote. Even more unexpected by the forecasters, Trump won the election in an unforeseen Electoral College rout.

It seems to me that Nate Silver should be given credit for being much *less wrong* than others (not to mention for sticking to his guns under withering, and in hindsight extremely foolish, criticism).

As a data scientist, I am interested in metrics that can quantify the relative effectiveness of probabilistic forecasts, in order to optimize forecasts in the future. What follows are two different methods that you may not have seen before that allow us to *quantify* exactly *how much less wrong* Nate Silver was than everyone else.

First I’ll use a standard Bayesian Inverse Probability approach. This approach is attractive in the present situation because, unlike statistical methods that require a large sample of outcomes in order to be reliable, it works perfectly fine for a sample size of one: the one election outcome that we have.

Under this method, I must first assume that one of the two probabilistic processes we are comparing is in fact responsible for generating any observed election outcomes. The first probability distribution, which I’ll call the “Consensus Process,” assigned probabilities of approximately 90% to a Clinton victory and 10% to a Trump victory. The second, “Silver Process,” assigned probabilities of approximately 65% to a Clinton victory and 35% to a Trump victory. I assume further that before observing the present election outcome, we had no rational basis to believe one of these two processes more than the other. Therefore the probability of each *process*, before any results are observed, is 50/50.

Applying Bayes’ Theorem, if an election victory for Clinton were the observed outcome, then it would be reasonable to infer that the probability that the Consensus Process generated the outcome was 58%, and that the Silver Process generated the outcome was 42%. This metric gives the edge to the Consensus Process, but not by an overwhelming margin.

On the other hand, since an election victory for Trump was observed, it would be reasonable to infer that the probability that the consensus process is the one that generated the observed result would be only 22%, while the probability that the Silver Process generated the observed result would be 78%.[i]

This is a pretty dramatic win for Silver, and it would seem that those who criticized him owe him an apology. It also suggests that he himself thinks of probabilities from a Bayesian Inference point of view, and is trying to minimize his parameter error in that context.

The second method will be familiar to my data science graduate students at Duke, with whom I approach the subject from both a Bayesian and an Information Theory perspective.[ii] I first need to assume a “base rate” for election of Republican and Democratic Presidential candidates. I will use a base rate of 50/50 (because the last 10 elections have been split right down the middle: 5 for the Democrat, 5 for the Republican; or because I really have no idea). Next, I treat the Consensus and Silver probabilistic forecasts as “side information” that could potentially allow a gambler who relies on one of them to bet on the outcome more successfully – with less uncertainty – than a gambler who knew only the base rate. The advantage of the second method is that rather than being a relative comparison of only the two processes, it is an absolute measure of forecast quality and any number of additional forecasts can also be compared using the same metric.

Based on a Clinton win, the Consensus probability would have reduced a gambler’s uncertainty about the outcome by 76.3%, while the Silver probability would have reduced their uncertainty by only 24.6%.

On the other hand, given a Trump win, both forecasts were *worse* than the base rate, so a gambler believing in them would have lost money, but the losses would be worse for the gambler betting based on the Consensus forecast. A gambler trusting in the Consensus would have *increased* their uncertainty over the base rate, by 23.2%, while a gambler trusting Silver would have increased uncertainty by only 18.0%.[iii]

**Final Scores:**

** **

Bayesian Inverse Probabilities:

Silver 78%, Consensus 22%

Kelly Side Information (both negative)

Silver -18%, Consensus -23.2%

For a detailed discussion and explanation of the first method, please see my Coursera course, Mastering Data Analysis with Excel.

https://www.coursera.org/learn/analytics-excel

For more details on the second method, you will need to enroll in my Data Mining course 590-05 here in the Duke MEM program.

p(Process|Observation) = p(Observation|Process)P(Process) / p(Observation) | |||

P(consensus process|Clinton win) | =(0.9)*(0.5) / (0.9)*(0.5)
=(0.65)*(0.5) |
||

= 0.58 | |||

p(Silver forecast’ Clinton win) | =(0.66)*(0.5) / (0.9)*(0.5)
=(0.65)*(0.5) |
||

= 0.42 | |||

Similarly | |||

p(Process|Observation) = p(Observation|Process)P(Process) / p(Observation) | |||

P(consensus process|Trump win) | =(0.1)*(0.5) / (0.1)*(0.5)
=(0.35)*(0.5) |
||

= 0.22 | |||

p(Silver forecast | Trump win) | =(0.35)*(0.5) / (0.1)*(0.5)
=(0.35)*(0.5) |
||

= 0.78 |

[ii] This method is called “Kelly doubling-rate-of-wealth scoring for individual sequences” – I really need to work on a better name.

[iii]

side information assuming base rate r = .5, .5 and | ||

Clinton win: | forecast b(i) | base rate r(i) |

under Consensus method | 0.9 | 0.5 |

under Silver method | 0.65 | 0.5 |

Trump win: | ||

under Consensus method | 0.1 | 0.5 |

under Silver method | 0.35 | 0.5 |

side information (in bits) | entropy of base rate | percent information gain |

0.763 | 1 | 76.3% |

0.246 | 1 | 24.6% |

-0.232 | 1 | -23.2% |

-0.180 | 1 | -18.0% |