Category Archives: Probability

North Carolina Gubernatorial Election

Tuesday night after the 4.7 million votes had been counted from all 2704 precincts Roy Cooper had a 4772 vote lead over Pat McCrory. Since there could be as many as 62,500 absentee and provisional ballots, it was decided to wait until these were counted to declare a winner. The question addressed here is: What is the probability that the votes will change the outcome?

The do the calculation we need to make an assumption:  the addition votes are similar to the overall population so they are like flipping coins. In order to change the outcome of the election Cooper would have to get fewer than 31,250 – (4772)/2 = 28,864 votes. The standard deviation of the number of heads in 62,500 coin flips is (62,250 x ¼) 1 / 2 = 125, so this represents 19.09 standard deviations below the mean.

One could use be brave and use the normal approximation. However, all this semester while I have been teaching Math 230 (Elementary Probability) people have been asking why do this when we can just use our calculator?

Binomcdf(40000, 0.5, 28864) = 1.436 x 10-81

In contrast if we use the normal approximation with the tail bound (which I found impossible to type using equation editor) we get 1.533 x 10-81.

We can’t take this number too seriously since the probability our assumption is wrong is larger than that but it suggests that we will likely have a new governor and House Bill 2 will soon be repealed.

Teaching Statistics using Donald Trump.

Recently, in Pennsylvania Donald Trump said “The only way they can beat me in my opinion, and I mean this 100 percent, if in certain sections of the state they cheat.”  He never said how he determined that. If it is on the basis of the people he talked to as he campaigned, he had  a very biased sample.

At about the time of Trump’s remarks, there was a poll showing 50% voting for Clinton, 40.6% for Trump and others undecided or not stating an opinion. Let’s look at the poll result through the eyes of an elementary statistic class. We are not going to give a tutorial on that subject here, so if you haven’t had the class, you’ll have to look online or ask a friend.

Suppose we have 8.2 million marbles (representing the registered voters in PA) in a really big bowl. Think of one of those dumpsters they use to haul away construction waste. Suppose we reach in and pick out 900 marbles at random, which is the size of a typical Gallup poll. For each blue Hillary Clinton marble we add 1 to our total, for each red Donald Trump marble we subtract 1, and for each white undecided marble we add 0.

The outcomes of the 900 draws are independent. To simplify the arithmetic, we note that since our draws only take the values -1, 0, and 1 they have variance less than 1. Thus when add up the 900 results and divided by 900 the standard deviation of the average is (1/900)1/2 = 1/30. By the normal approximation (central limit theorem) about 95% of the time the result will be within 2/30 = 0.0666 of the true mean. In the poll results above the average is 0.5-0.406 = 0.094, so by Statistics 101 reasoning we are 95% confident that there are more blue marbles than red marbles in the “bowl.”

That analysis is over simplified in at least two ways. First of all, when you draw a marble out of the bowl you get to see what color is. If you ask a person who they are going to vote for then they may not tell you the truth. It is for this reason that use of exit polls have been discontinued. If you ask people how they voted when they leave the polling place, what you estimate is the fraction of blue voters among those willing to talk to you, not the faction of people who voted for blue. A second problem with our analysis is that people will change their opinions over time.

A much more sophistical analysis of polling data can be found at FiveThirtyEight.com, specifically at http://projects.fivethirtyeight.com/2016-election-forecast/ There if you hover your mouse over on Pennsyllvania (today is August 16) you find that Hillary has an 89.3% chance of winning Pennsylania versus Donald Trump’s 10.7%, which is about the same as the predictions for the overall winner of the election.

The methodology used is described in detail at

http://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast/

In short they use a weighted average of the results of about 10 polls with weights based on how well the polls have done in the past. In addition they are conservative in the early going, since surprises can occur.

Nate Silver, the founder of 538.com, burst onto the scene in 2008 when correctly predicted the way 49 of 50 state voted in the 2008. In 2012, while CNN was noting Obama and Romney were tied at 47% of the popular vote, he correctly predicted that Obama receive more than 310 electoral votes, and easily win the election.

So Donald, based on the discussion above, I can confidently say that  no cheating is needed for you to lose Pennsylvania. Indeed, at this point in time, it would take a miracle for you to win it.

The odds of a perfect bracket are roughly a billion to 1

This time of year it is widely quoted that odds of picking a prefect bracket are 9,2 quintillion to one. In scientific notation that is 9.2 x 1018 or if you like writing out all the digits it is 9,223,372,036,854,775,808 to 1. That number is 263, i.e., the chance that you succeed if you flip a coin to make every pick.

If you know a little then you can do much better than this, by say taking into account the fact that a 16 seed has never beaten a one-seed. In a story widely quoted last year “Duke math professor Jonathan Mattingly calculated the odds of picking all 32 games correctly is actually one in 2.4 trillion.” He doesn’t give any details, but I don’t know why I should trust a person who doesn’t know there are 63 games in the tournament.

Using a different approach, DePaul mathematician Jay Bergen  calculated the odds at one in 128 billion. His youtube video from four years ago https://www.youtube.com/watch?v=O6Smkv11Mj4 is entertaining but light on details.

Here I will argue that the odds are closer to one billion to 1. The key to my calculation of the probability of a perfect bracket is use data from outcomes of the first round games for 20 years of NCAA 64 team tournaments. The columns give the match up, the number of times the two teams won and the percentage

1-16                 80-0                 1

2-15                 76-4                 0.95

3-14                 67-13               0.8375

4-13                 64-16               0.8

5-12                 54-26               0.675

6-11                 56-24               0.7

7-10                 48-32               0.6

8-9                   37-43               0.5375

From this we see that if we pick the 9 seed to “upset” the #8 but in all other case pick the higher seed then we will pick all 8 games correctly with probability 0.09699 or about 0.1, compared to the 1/256 chance you would have by guessing.

Not having data for the other seven games, I will make the rash but simple assumption that picking these seven games is also 0.1. Combining our two estimates, we see that the probability of perfectly predicting a regional tournament is 0.01. All four regional tournaments can then be done with probability 10-8. There are three games to pick the champion from the final four. If we simply guess at this point we have a 1 in 8 chance ad a final answer of about 1 in a billion.

To argue that this number is reasonable, lets take a look at what happened in the 2015 bracket challenge. 320 points are up for grabs in each round: 10 points for each 32 first round games (the play in or “first four games” are ignored), 20 for each of the 16 second round games, and so on until picking the champion gives you 320 points. The top ranked bracket had

27 x 10 + 14 x 20 + 8 x 40 + 4 x 80 + 2 x 160 + 1 x 320 = 1830 points out of 1920.

This person missed 5 first round and 2 second round games. There are a number of other people with scores of 1800 or more, so it is not too far fetched to believe if the number of entries was increased by 27 = 128 we might have a perfect bracket. The last calculation is a little dubious but if the true odds were 4.6 trillion to one or event 128 billion to 1, it is doubtful one of 11 million entrants would get this close.

With some more work one could collect data on how often an ith seed beats a jth seed when they meet in a regional tournament or perhaps you could convince ESPN to see how many of its 11 million entrants managed to pick a regional tournament correctly. But that is too much work for a lazy person like myself on a beautiful day during Spring Break.

Probability and the Florida Lottery

The usual probability story in this context is something like the following. A New Jersey woman, Evelyn Adams, won the lottery twice within a span of four months raking in a total of 5.4 million dollars. She won the jackpot for the first time on October 23, 1985 in the Lotto 6/39 in which you pick 6 numbers out of 39. Then she won the jackpot in the new Lotto 6/42 on February 13, 1986. Lottery officials calculated the probability of this as roughly one in 17.1 trillion, which is probability that one preselected person won the lottery on two preselected dates.

When one realizes that (i) somebody won the October 23, 1985 lottery. (ii) We would have been equally impressed if this happened twice within a one year period. (100 twice weekly drawings) (iii) Many people who play the lottery buy more than one ticket. Taking these three things into account he probability ends up to be about is now about 1/200. If we take into account the number of states with lotteries. For more examples of things that aren’t as surprising as they seem look at http://www.math.duke.edu/~rtd/Talks/Emory.pdf.

A recent paper on the arXiv:1503.02902v1 by Rich Arratia, Skip Garibaldi, Lawrence Mower, and Philip B. Stark tells a different type of story. In Florida’s Play 4 game you pick a four digit number like 3782 and if all four digits match you win $5000. The fact that this event has probability 1/10000 and hence nets you 0.50 average, says either that (i) people can’t think or (ii) they have utility functions that value a large sum disproportionately more than the $1 you use to play the game.

Some people however are very good at winning this gamble. An individual that we will call LJ has won 57 times. Now that by itself is not proof of guilt. If he bought 570,000 tickets he would end up with about this many wins. However that seems a little unlikely. If he only bought 250,000 tickets the probability of 57 wins is 1.22 x 10-8. (Exercise for the reader.)

Arratia et al give a very nice calculation that shows something funny must be going on. Skipping the math, the bottom line is that if the 19 million people that live in Florida all sold their houses, and took the $175,000 in proceeds (this is the average house value) and bought lottery tickets (reinvesting the winnings) until they ran out of money, the probability that someone would win 57 times or more is 1 in a million.

How did LJ get so lucky? Well there are three common schemes. (i) A clerk can scratch the ticket with a pin revealing enough of the bar code to be able to scan it to see if it is a winner. (ii) Sometimes a customer will ask the clerk if the ticket was a winner. If so the clerk may lie about the ticket being a winner and keep the money himself. (iii) Sometimes the winner may be an illegal immigrant or owe child support or back taxes, and will sell the ticket to an aggregator who pays half price for it and later claims the prize. This is a good scheme for people who want to launder money.

It would be nice if I could tell you that probability helped catch a criminal but at least it wasn’t involved in a miscarriage of justice like Sally Clark experienced. She was convicted of murder based on the calculation that the odds were 73 million to 1 against two of her children dying of what is called cot death in the UK.