Recently, in Pennsylvania Donald Trump said “The only way they can beat me in my opinion, and I mean this 100 percent, if in certain sections of the state they cheat.” He never said how he determined that. If it is on the basis of the people he talked to as he campaigned, he had a very biased sample.

At about the time of Trump’s remarks, there was a poll showing 50% voting for Clinton, 40.6% for Trump and others undecided or not stating an opinion. Let’s look at the poll result through the eyes of an elementary statistic class. We are not going to give a tutorial on that subject here, so if you haven’t had the class, you’ll have to look online or ask a friend.

Suppose we have 8.2 million marbles (representing the registered voters in PA) in a really big bowl. Think of one of those dumpsters they use to haul away construction waste. Suppose we reach in and pick out 900 marbles at random, which is the size of a typical Gallup poll. For each blue Hillary Clinton marble we add 1 to our total, for each red Donald Trump marble we subtract 1, and for each white undecided marble we add 0.

The outcomes of the 900 draws are independent. To simplify the arithmetic, we note that since our draws only take the values -1, 0, and 1 they have variance less than 1. Thus when add up the 900 results and divided by 900 the standard deviation of the average is (1/900)^{1/2} = 1/30. By the normal approximation (central limit theorem) about 95% of the time the result will be within 2/30 = 0.0666 of the true mean. In the poll results above the average is 0.5-0.406 = 0.094, so by Statistics 101 reasoning we are 95% confident that there are more blue marbles than red marbles in the “bowl.”

That analysis is over simplified in at least two ways. First of all, when you draw a marble out of the bowl you get to see what color is. If you ask a person who they are going to vote for then they may not tell you the truth. It is for this reason that use of exit polls have been discontinued. If you ask people how they voted when they leave the polling place, what you estimate is the fraction of blue voters among those willing to talk to you, not the faction of people who voted for blue. A second problem with our analysis is that people will change their opinions over time.

A much more sophistical analysis of polling data can be found at FiveThirtyEight.com, specifically at http://projects.fivethirtyeight.com/2016-election-forecast/ There if you hover your mouse over on Pennsyllvania (today is August 16) you find that Hillary has an 89.3% chance of winning Pennsylania versus Donald Trump’s 10.7%, which is about the same as the predictions for the overall winner of the election.

The methodology used is described in detail at

In short they use a weighted average of the results of about 10 polls with weights based on how well the polls have done in the past. In addition they are conservative in the early going, since surprises can occur.

Nate Silver, the founder of 538.com, burst onto the scene in 2008 when correctly predicted the way 49 of 50 state voted in the 2008. In 2012, while CNN was noting Obama and Romney were tied at 47% of the popular vote, he correctly predicted that Obama receive more than 310 electoral votes, and easily win the election.

So Donald, based on the discussion above, I can confidently say that no cheating is needed for you to lose Pennsylvania. Indeed, at this point in time, it would take a miracle for you to win it.