All posts by Rick Durrett, Ph.D.

Griffin vs. Riggs: what are the odds the result will change if 65,000 votes are thrown out?

February 22, 2025Politics, ProbabilityRick Durrett, Ph.D.

Post updated March 6 and 7 — see the end.

Justice Allison Riggs maintains a 734-vote lead over Republican rival Jefferson Griffin after more than 5.5 million ballots were cast and two recounts. Griffin is seeking to have 65,000 ballots thrown out. Today I will answer a question no one has asked: What is the probability he will win the election if the votes are thrown out? The answer is 2/1000.

A reversal may sound extremely likely, but the square root of 65,000 is 255. Suppose we have 5.5 million ping pong balls in a swimming pool. Exactly half have R for Riggs and half have G for Griffin. We then draw 65,000 ping pong balls out. What is the probability that the outcome of the election is changed? If we count balls with R as +1 and balls with G as -1 then the election result is change if the sum of the numbers which we call S is > 734, since this is the net number of votes lost by Riggs.

In the jargon of probability these balls are drawn without replacement, but given the large number of balls, this differs very little from drawing with replacement, which is just flipping 65000 coins and counting +1 for heads and -1 tails. On the average the net change in the outcome is 0, or in the terminology of probability the expected value of S = 0. Of course the resulting sum is unlikely to be exactly 0. In elementary probability class we learn how to compute the size of the typical deviation of S from 0, which is the standard deviation. In the case of 65000 independent random variables that are +1 with probability ½ and -1 with probability ½ the standard deviation is the square root of 65,000 or 255.

The central limit theorem allows us to estimate the probability the number of votes lost by Riggs, S > 734. This value is 734/255 = 2.878 standard deviations above the mean. Using my calculator function normalcdf I can calculate using the normal approximation that the probability the outcome of the election does not change is 0.997998, or the chances that the election outcome will change is less than 2 out of 1000 or 0.2%.

Taking this reasoning further. Let us suppose that throwing out the 65,000 votes results in a loss of 1530 votes for Riggs. This result is 6 standard deviations above the mean. Using the calculator again gives a probability of 0.00000001 that this occurs. Thus if throwing out the votes results in a change of more than 1530 votes in the outcome then we are absolutely certain that the votes chosen to be challenged were not chosen at random. They are a biased sample of ineligible voters that contains more Democrats than one would. This is hardly surprising since the Republican party made the list, but this observation provides one more reason that the votes should not be thrown out.

One reader questioned the assumption of indpendence of party and registration status. One practical consideration is that if we don’t do that we can’t compute anything. In ecological terms, independence is a null model. Once we reject it then we ask why. One explanation is that democrats are more incompetent at filling out the registration form, but it seems more likely that the Repubicans who made the list were more likely to put Demicrats on it.

————————————-

As I have thought about the situation more I think that is a convincing argument for not throwing out these votes can be based on statistical data. If the list consisted of say 3000 voters all of whom were Democrats, it would be ridiculous to throw out these votes which would hand to election to Griffin.

I don’t have access to the list of 65,000 names and their parties but if for instance it consisted of 30,000 Democrats, 20,000 Independents and 15,000 Republicans this would show a bias since the fraction of registered voters of the three types are 32% D, 38% I, 30% R. Information quoted in Riggs’ emails already points to an unexpectedly large number of college students on the list.

. I suspect that at the end of the process when the case reaches the NC Supreme Court it will be decided on party lines and Griffin will be elected by the same group that decided partisan gerrymandering is legal. As the actions of the legislature in the years that they have had a veto proof majority indicate, Republicans are not swayed by arguments about what is unfair. But I guess the large fraction of them who believe that Trump won the 2020 election indicates they don’t believe in statistics either.

Presidential Polls are Flawed

October 31, 2024UncategorizedRick Durrett, Ph.D.

This was written Sunday October 27, 2024 and will not be revised.

The president is not elected by popular vote.

I remember watching Headline News on CNN a few days before the 2012 election and hearing that the election between Obama and Romney was a statistical dead heat. Each candidate was predicted to have 47% of the vote. However, the results in the Electoral College were not close Obama 332, Romney 206. People who forecasted based on a state by state basis (Nate Silver and a talented amateur at Princeton) concluded on the days before the election that the probability Obama would win was 99%.

Even when the difference between the candidates is less than the margin of error of the poll it still contains information.

Let’s start with the question: what is the margin of error? It is often reported, but rarely with the sample size of the poll which is needed to be able to check the calculation. Some examples with both pieces of data are Franklin-Marshall for Penn 794/4.3; Emerson College for North Carolina 1226/3.6, and for Wisconsin 800/3.4; Marist for Arizona and Georgia 1193/3.9.

If Ɵ is the estimated frequency of votes for Harris then using standard formulas from an intro states class the radius of the 95% confidence interval the margin of error is 1.96(Ɵ(1-Ɵ)/n)^0.5. Since Ɵ is usually in (0.4,0.6) then 0.24 < Ɵ(1-Ɵ) < 0.25 and replacing 1.96 by 2 we can simplify this to 1/n^0.5 and our data points become

n,error 800, 0.0353 1200, 0.288

somewhat smaller than the reported margins of error.

Frequencies should be reported to one decimal place and should be recalculated to remove responses that are not one of the two major candidates to make it easier to compare predcitions.

As some readers have noticed I have not gotten to #2 yet. That is coming soon. Bloomberg, the one poll that I have seen which reports frequencies with one decimal place had on October 23:

Harris/Trump H/(H+T) P(H win) 538 ave.

Michigan 49.6/46.9 51.4 79.2 H 0.6

Penn. 50/48.2 50.9 67.7 H 0.3

Nevada 48.8/48.3 50.3 59.2 H 0.2

Wisconsin 48.3/48 50.2 51.6 H 0.1

Arizona 49.1/46 50.2 51.6 T 1.8

Carolina 48.5/48.8 49.4 43.9 T 1.2

Georgia 48.4/49.9 49.3 35.2 T 1.5

If the uncertainties look large, it is because they are based on one poll. Once one has the average of several polls then the uncertainties decrease. Of course comparing Bloomberg with 538 average shows that there are sampling biases. The Washington Post is even more discordant. They have Harris leading in all states except Arizona. I got these figures from an article in Forbes but when I went to check them the Post wanted $2 to read the article.

Combining these projections to calculate is difficult without a computer and a good programmer. 538 uses simulations to conclude that, when this was written: Trump wills with probability 0.54 and Harris with probability 0.45.

https://projects.fivethirtyeight.com/2024-election-forecast/

One final selling point for people who publish polls, if you make predictions like this you can’t be wrong.

Trump 323, Biden 215

July 2, 2024UncategorizedRick Durrett, Ph.D.

This blog post is based on the previous one The Arithmetic of Presidential Elections. There we used data from the last three presidential elections to argue that there are Red States with 170 electoral vote, Blue States with 215, and Swing States listed below with 153.

Turning to https://projects.fivethirtyeight.com/polls/president-general/2024/

for data, we see that Biden trails in every swing state

States Votes Biden Trump

Ohio 18 36.8 46.1

Georgia 16 38.3 44.8

Arizona 11 38.9 43.9

Iowa 6 32 50.4

Carolina 15 38.0 45.2

Florida 29 36.5 46.1

*Penn. 20 41.6 43.6

*Wisc. 10 41.8 42.8

Nevada 6 38.7 42.6

*Michigan 16 41.2 43.0

*New Hamp. 4 42 44

Since this predicts Biden will only carry the Blue States this gives the prediction above.

Flipping the four closest states marked with * would result in 273 – 265 but likely the polls do not yet fully reflect his abysmal debate performance.

The Arithmetic of Presidential Elections

July 2, 2024PoliticsRick Durrett, Ph.D.

In November of 2012, I remember watching Robin Meade on Headline News saying that the race between Obama and Romney was a statistical dead heat. Both were projected to get 47% of the popular vote. Meanwhile a guy named Nate Silver on his web site 538.com was predicting based on state-by-state data that Obama would win by a comfortable margin and he did.

2012 Obama 332 Romney 206

2016 H. Clinton 227 Trump 304 Other 7

2020 Biden 306 Trump 232

My question here: is what will happen in 2024?

By now it is widely recognized that there are Red States, Blue States and Swing States. To identify these we gathered data by looking at the Wikipedia articles on the 2012, 2016, and 2020 Presidential Elections, where it can be downloaded into a spreadsheet. Rather than use our “knowledge of politics” to classify the states, we declared a state RED if it had gone Republican in all three elections, BLUE if it had gone Republican in all three elections, and SWING otherwise.

Cleaning the Data

Dealing with real data is annoying. One immediate headache is that in Maine and Nebraska two electoral votes are decided by the voters of the state as a whole, and the others are determined by votes in the various congressional districts. This is handled differently in the three data sets. Since ME-1, ME-2, NE-1, NE-2, and NE-3 were consistently used for the districts, we made the brilliant decision to use ME-0 and NE-0 for the statewide electoral votes.

The New and North states were also a problem. In one data set they are alphabetized by their abbreviations: N.C., N.D., N.H., N.J., N.M., and N.Y. while in the other two the names are written out. Before I realized this, there were some very surprising patterns in the voting of New Yorkers. Finally when I thought I had found all of the differences I saw that in the 2020 Maryland was abbreviated Md. so it swapped places with Massachusetts, and in all three data sets West Virginia was written as W.Va and hence came before Washington (the state).

Our methodology

To create an ordering of the redness of states, we looked at the percentages of votes for Democrats minus the percentages of votes for Republicans, and ranked them by the sum of the numbers for the three elections. This produced the following classification:

Red States (170 Electoral Votes)

(Reddest first) NE-3, Wyo., W.Va., Oklahoma, Idaho, North Dakota, Utah, Kentucky, Arkansas, Alabama, South Dakota, Tenn., NE-0, Kansas, Louisiana, NE-1, Montana, Miss., Indiana, Missouri, Alaska, South Carolina, Texas

Blue States (215)

Minnesota, Colorado, ME-0, New Mexico, Oregon, New Jersey, Delaware, Washington, Illinois, Virginia, Conneticut, ME-1, Rhode Island, Vermont, New York, California, Mass., Maryland, Hawaii, D.C. (Bluest last) On the average 85% of people in DC voted Democratic in the last three years, but that may change this year

Swing states (153)

(almost red) Ohio, Georgia, Arizona, Iowa, ME-2, North Carolina, Florida, NE-2, Pennsylvania, Wisconsin, Nevada, Michigan, New Hampshire (almost blue)

Reality Check

According to Wikipedia “areas considered battlegrounds in the 2020 election were Arizona, Florida, Georgia, Iowa, Maine’s 2nd congressional district, Michigan, Minnesota, Nebraska’s 2nd congressional district, Nevada, New Hampshire, North Carolina, Ohio, Pennsylvania, Texas and Wisconsin.”

Moneyline Wagering II

March 31, 2024ProbabilityRick Durrett, Ph.D.

Given that sports betting has come to North Carolina I am going to revisit the topic and derive some general formulas. The take home message is that while you might think you could make money betting against people who know less than you do. It is just like going to a casino. The typical moneyline wager looks like

Fav(orit)e -B (Under)Dog +A

What this means is that you have to bet $B on Fave to win $100, while if you bet $100 on Dog you win $A. Let p be the probability Dog wins.

For the bet on Fave to be fair we need 100(1-p) – Bp = 0 or p = 100/(100+B)

For the bet on Dog to be fair we need -100(1-p) + Ap= 0 or p = 100/(100+A)

In practice A< B so if 100/(100+B) < p < 100/(100+A) both bets are unfavorable.

For a numerical example, in the sweet 16 of the 2024 NCAA tournament we saw the bet

Marquette -300, NC State + 230, so both are unfavorable if

0.25 = 100/400 < p < 100/330 = 0.303

In this particular case NC State won, which is not incredibly unexpected since when you roll two dice then the probability of a total of 9 or more has probability 10/36 = 0.2777

Another way of looking at this is through the money bet. If a fraction x of people bet on Fave then

When Fave wins the average winnings are 100x – 100(1-x) which is < 0 if x < 1/2

When Dog wins the average winnings are -B x + A(1-x) which is < 0 if x > A/(A+B)

In our concrete example both bets are bad for players if 0.434 = 230/530 < x < 1/2

While the sports book has no control over the probability that Dog wins, they can control the fraction of money bet on the favorite by adjusting the odds over time, or using their knowledge of previous bets to choose good values of A and B.

When A/(A+B) < x < ½ then the sports book has an arbitrage opportunity. They will make money under either outcome. In the theory of option pricing it is assumed that arbitrage opportunities do not exist (or if they do are short lived), and based on this one can derive what is the right price for a “derivative security” such as a call or put option based on the stock prices.

If you want to learn about this look at Chapter 6 of my book Essentials of Stochastic Processes which you can download in pdf form from my web page. https://services.math.duke.edu/~rtd/EOSP/eosp.html

My goal here is to make two points (i) the sports books are not gambling: they have things set up so that they win money no matter what the outcome is, (ii) even though you are gambling against a group of people and it seems that you can win money if you are smarter than they are, that is an illusion. Like casino gambling, things are set up so that all of the bets are unfavorable. So like the TV commercials say look at sports betting as a way of enhancing the fan experience not as a way to make money. However with betting available 24-7, in game parlays and bonus bets designed to get you used to betting a lot of money, this like the lottery, is a scheme designed to take money from people who are gambling with money they can’t afford to lose.

Remembering Krishna Athreya

April 4, 2023ReflectionsRick Durrett, Ph.D.

As I write this, he passed away about a week ago. While the end of a person’s life is a sad time, it also provides an opportunity to reflect on the past. The first memory that came to my mind was of sitting in Sid Resnick’s office in the Stanford Statistics department in 1974 (or 1975.) A group of us got together once a week to make our way through the new book on Branching Processes by Athreya and Peter Ney. It was a welcome modernization of Harris’ 1963 book that started the subject. There were crisp proofs of the basic facts about Galton-Watson processes, Markov branching processes (with exponential lifetimes), the age-dependent case (general lifetimes), and multi-type branching processes. Over my career I have loaned this book to my graduate students many times to help them learn the subject. Remarkably I still have it but it is a little worse for wear.

#2 on my list of Athreya’s greatest hits is A new approach to the limit theory of recurrent Markov chains which appeared in the Transactions of the American Math Society, a paper that was written with Peter Ney in 1978. Again this is a contribution to an area founded by Ted Harris. While Markov chains on discrete state space are well understood, on a general state space numerous pathologies arise. Harris’ genius was to identify a class of these chains that have a tractable theory and cover a number of examples.

There is an elegant analytical theory described in the book by Revuz. However in 1978 several researchers, including Esa Nummelin who later developed a book based on this approach, had the same idea at the same time. I remember attending a session of talks at the 1978 meeting on Stochastic Processes and their Applications and hearing three talks on the topic. This was devasting for a Ph.D. student in the audience who was working on this for his thesis.

The idea is simple but brilliant: a Harris chain can be modified to have one state that is hit with positive probability starting from any state and having one such state is enough to carry out all the usual theory for the discrete case. Given this hint I am sure you can work out the details for yourself. I was so excited by the idea that I put it in the Markov chains chapter of my graduate text book.

Returning to a more traditional narrative: Krishna Athreya received his Ph.D. in 1967 from Stanford where he worked with Sam Karlin, a legendary probabilist with an impressive pedigree: son of Bochner, who was a grandson of Hilbert, and the mentor for 44 students including Tom Liggett and Charles Stone among many others. Athreya’s thesis topic was Multitype Continuous Time Markov Branching Processes and Some Classical Urn Schemes. Soon after he got his degree Athreya and Karin worked on Branching processes in random environments. Two papers were published in Annals of Mathematical Statistics in 1971 since the Annals of Probability which began in 1973 did not yet exist.

These two papers like many in Athreya’s top 20 most cited on MathSciNet contain a number of ideas that have not been fully explored. An example is the work with his Ph.D. student Jack Dai on random logistic maps. Last but not least, I would like to mention his 1994 paper on large deviations for branching processes which contains material that working probabilists should know. Athreya has left an impressive mathematical legacy that will enrich your life and research if you have the time to read it. It is sad that there will be no more work coming from him, but I hope others who read this will be inspired to continue his work. .

The Tea-Cup Problem

February 11, 2023UncategorizedRick Durrett, Ph.D.

Here’s a little problem to test your skills at combinatorial probability.

You have a set of six cups and saucers. Two are NC State red R, two are UNC light blue b, and two are Duke dark blue B. You place the saucers in a line on the table RRbbBB. Then a blind man comes in and puts the saucers on the cups in random order. Let M be the number of cups that match the color of the saucer they are on. Your job is to compute the distribution of M.

To get you started I will specify a probability space which is the first step in solving any problem of this type. I once thought it was good to number the cups but a student in my class this year taught me it was better to treat the two cups of a given color as indistinguishable so we have 6!/(2!2!2!) = 90 outcomes instead of 720. To help check the solution note that not only should the probabilities sum to 1, but we must have EM = 6(1/3)=2. In the next paragraph I will start to reveal the solution starting at 6 and working down, so if you want to discover it on your on you should stop scrolling.

P(M=6)=1/90. In our probability space there is only one outcome where the cups all match, which is better than the situation when the cups are numbered and there are 2 x 2 x 2.

P(M=5)=0. If say the 2R match and 2b match then the 2B must match so 5 is impossible

P(M=4)=12/90. Matching 2-2-0 is impossible by the reasoning for 5, so we must have 2-1-1. There are 3 ways to pick the color with two matches, and for each color with only one match 2 choices of where the matching cup is. The rest of the outcome is now forced, e.g., RRbBBb.

P(M=3)=16/90. Matching 2-1-0 is impossible so we must have 1-1-1. We can pick the locations of the matching cups in 2 x 2 x 2 ways. The other three nonmatching cups must be either BRb or bBR

P(M=2)=27/90. We can have 2-0-0. Once we pick the double matching color in 3 ways the rest is forced, e.g. RRBBbb. We can have 1-1-0. We can pick the color with no match in 3 ways and 2 x 2 ways for the location of the matching cups. Suppose dark blue has no match. Then the two B cups must be on R and b, but there are 2 ways to put R and b on the B sauces for a total of 24 + 3 = 27.

P(M=1) =24/90. We can pick the location of the matching cup in 6 ways. Suppose it is the first R saucer. The second Red saucer can be B or b (2 ways). If it is B then we have bb on the Blue saucers, and we can have RB or BR on the b sauces (x2). If it is B then we have BB on the blue saucers and we have two possibilities on the B saucers, but autocorrect in Word will not let me type them.

P(M=0) =10/90. We can have BB on the Red saucers and then must have RR on b and bb on B. The situation is similar for bb on Red. This gives 2 outcomes. If we have {Bb} on the red saucers then we must have {Rb} on b and {Rb} on B where the set braces indicate we have not specified the order, so there are 2 x 2 x 2 = 8 outcomes.

1+12+16+27+24+10=90, 6 x 1+12 x 4+3 x16 +2 x 27+1 x 24 = 180 (so the mean is 2).

A simple recipe for Chole

May 16, 2022Cooking QEDRick Durrett, Ph.D.

This the first of a series of posts in the Category: Cooking: QED, which stands for quick, easy and delicious. The last word may be a bit of a stretch but dumb or dull does not seem to set the right tone. The recipeI am about to share has a long history with my family. Soon after David was born, Susan went to a play group at Cornell for mothers of young children. There she met Smita Chandra, who was a nanny taking care of another family’s child. Despite the difference in “status” they became good friends. Susan spent many hours watching her develop the recipes for From Bengal to Punjab: The Cuisines of India by Smita Chandra, published in Oct 1, 1991. This and Smita’s two other books can be found on Amazon.

The cookbook was the source of my recipe for Chole. This dish is known as Channah Masala at the Indian restaurant Tandoor in the food court at West Campus Union on the Duke campus. Like covid the recipe has been radically changed by a series of mutations. This and all the other recipes in my blog are designed for two people.

Step 1. Open 2 cans (16 oz or 14.5 oz or whatever they are these days) garbanzo beans. Drain off the liquid, add ½ cup water, and cook 10 minutes on 70% in microwave. This method relieves the boredom of heating them in a sauce pan and allows for parallel processing

Step 2. Cut 1 medium onion and 1 medium tomato into small pieces. In a 3 or 4 quart pan cook onion (once you have done chopping it), and add then tomato.

Step 3. By now the beans should be done. Drain off about half the water, add to pan, and stir to mix up the ingredients. Then add: 1t cumin, ½ t coriander, ½ t turmeric, ¼ t cayenne, 1 t garam masala, 1T lemon juice. T is not a typo it is Tablespoon versus teaspoon. Of course I don’t actually measure these things, just dump what looks like the right amount on top of the beans, and then stir to mix them up.

Step 4. Cook 5 minutes and let it sit on the still warm burner for a few minutes. Divide into four approximately 8 ounce servings. Keep one for tonight’s dinner and put the other three in the freezer (the appliance in the basement that is dedicated to this purpose, not the one that is part of your refrigerator)

To go along with the chole, get one pound of chicken tenders. Divide them into two batches and freeze one. Cut the chicken tenders into pieces that are about 1 inch long (or whatever size that looks right to you). Saute them in a small amount of olive oil infrying pan until they are done, and then cover with an appropriate amount of Tikka Masala Sauce, an continue heating until the sauce is warm..

Samosas (an Indian pasty with potatoes and peas) are the third part of the dinner. The ones I use come frozen and you cook them in a 375 oven for 15 minutes. Which of course means the first step in preparing dinner is to preheat the oven. We use the ones made by Sukhi Singh (www.sukhis.com). Before the pandemic there were 10 in a box but now there are only 8. Sukhi confidently says ”There are two types of people: people who love Indian cuisine, and those who just haven’t tried it yet.”

I wish I had the courage to say: ”There are two types of people: people who love probability, and those who haven’t read my books yet.” But I don’t want to follow in the footsteps of the Duke undergrad who plagiarized her commencement speech almost word for word from one that was given at Harvard a few years ago. I follow the rule: if you copy from one book it is plagiarism, if you copy form 10 it is scholarship. Of course you should change the numbers or the notation and introduce your own typos.

Fear, Loathing, and Surprise at the Kentucky Derby

May 7, 2022RantRick Durrett, Ph.D.

NBC coverage begins today at 2:30PM with the race slated for 6:57PM. Last year Medina Spirit made 1.86 million dollars for a two minute race, eclipsing what Stormy Daniels was paid for what was presumably a somewhat longer ride on Donald Trump.. The win was negated by the drug test Medina Spirit failed after the derby. Just as abruptly as the horse had reached the top of the sport, the feisty colt collapsed during a workout at Santa Anita Park in Arcadia, Calif.

This type of Shakespearean drama is rare at the Derby. The Kentucky Derby is Decadent and Depraved, Hunter Thompson wrote in a June 1970 artcle. This year’s spectacle featured a limited number of $1000 mint juleps in a signature that sold out well before race time. To be drunk no doubt by women in $10,000 hats saying “if the peasants have no food let them eat cake.” I can’t match Hunter’s style so I’ll leave you to read his article.

http://grantland.com/features/looking-back-hunter-s-thompson-classic-story-kentucky-derby/

The article is long but you have almost four and a half-hour to kill before the race. According to Wikipedia Hunter rose to prominence with the publication in 1967 of Hell’s Angels, a book he wrote while spending a year riding with motorcycle gang. The article on the derby is next in the narrative followed by his book Fear and Loathing in Las Vegas. I read the book as an undergrad. Based on what occurred in the book I am surprised he made it to age 68. The book is a surreal descent into drug abuse. Read the book, don’t see the 1998 movie starring Johnny Depp. It is almost as dreadful as the made for TV trial co-starring Amber Heard, a film noir version of the old show Lifestyles of the Rich and Famous.

* * * * *

Saturday night after watching most of the news on CBS I switched over to watch the running of the Derby. Running a horse in the Derby is the dream of everyone who races horses. I remember my dentist in Ithaca have a horse in the race one year. I fell 50 yards out of the gate, broke its leg, and had to euthanized.

The owners of Rich Strike had a much happier experience. The colt wasn’t even in the field until Friday, when he drew into the race after another horse was scratched. Wearing #21 it started in the 20^th chute far away from the rail, he carried 80:1 odds but he came from behind to pull off one of the biggest shocks in Derby history.

A poetic writer in the New York Times seemed to follow Moses’ path through the Red Sea to a three-quarter length victory. In more prosaic terms his first step to victory was to get from the extreme edge to the middle of the pack. Then at about the ¾ mark in the race he moved through the pack to a commanding lead. However in the modern era I don’t need words you can see it for yourself

https://www.youtube.com/watch?v=DFb2XSDv6vE

The horse moved so fast and was so agitated after the finish, trying to bite the horse of the rider who was trying to guide him to the winner’s area, I thought for a moment that this would be a situation where the horse got his speed from a syringe but there hasn’t been anything on the news so I assume that this time the horse passed his drug test.

The owner who bought the horse for $30,000 was charming in his excitement: “What planet is this?” Dawson said. “I feel like I have been propelled somewhere. I’m not sure. This is unbelievable. I asked my trainer up on the stage, I said, ‘Are you sure this is not a dream? Because it can’t be true.’ He assured me this is real. I said OK.”

So there can be feel good stories at the Derby and not only for the owners. Rich Strike paid $163.60 to win on a $2 bet. The 21-3 exacta paid $4101.20 on a $2 bet; the 21-3-10 Trifecta $14,870.70 on a $1 bet, and the Superfecta 21-3-10-13 $321,500.10 on a 1 bet so even if you would have bet on all P(20,4) = 116,280 possibilities you would have won big.

WORDLE for TYROS

April 8, 2022UncategorizedRick Durrett, Ph.D.

Tyro is a bit of crosswordese that means beginner or novice. Writing this reminds me of my first WORDLE in which I failed to guess TACIT in six tries. A tweet related to this puzzle which found its way into Rex Parker’s NYTimes Xword blog said something like the following: The answer reminds me of why I don’t do crosswords they are done by old people writing old words into the grid.

Turning to the main subject, as most of you probably know in WORDLE you get six tries to guess a five-letter word. On each turn you guess a five-letter word, a rule which prevents you from guessing say AEIOU to find out what vowels are present. If a letter is in the correct location it shows green. If it is in the puzzle but not in the right place then it is white. If it is not in the answer it is gray. (Colors may vary) A copy of a computer key board on the screen allows you to enter you guesses and shows the status of each letter you have guessed.

As I start to give my advice I must admit I am still a novice but that never stopped TRUMP from pontificating on how to be president. In thinking about how to play WORDLE it is useful to know how frequently letters are used in the English language.

When Samuel Morse wanted to figure this out in the 1800s, he looked at the frequency of letters in sets of printers type which he found to be (numbers in thousands) E (12), T (9), A, E, I, O, S (8), H (6.4), R (6.2), D(4.4), L (4), U (3.4), C,M (3), etc. With computers and electronic dictionaries at our disposal we have a more precise idea (numbers are percentages).

E: 11.16 A: 8.50 R: 7.58 I: 7.55 O: 7.16 41.95

T: 6.95 N: 6.65 S: 5.74 L: 5.49 C: 4.54 + 29.73 = 71.68

U: 3.63 D: 3.38 P: 3.17 M: 3.01 H: 3.00 + 16.19 = 87.87

G: 2.47 B: 2.07 F: 1.81 Y: 1.78 W: 1.29 9.42

K: 1.102 V: 1.007 X: 0.290 Z: 0.272 J,Q: 0.196 2.93

Here the numbers in the last column are the sum of the numbers on the row and we have made 26 divisible by 5 by putting J and Q which have the same frequency to 3 significant figures into the same entry. This table become somewhat irrelevant once you visit

https://leancrew.com/all-this/2022/01/wordle-letters/

to find the letter frequencies in five letter words.

A: 10.5 E: 10.0 R: 7.2 O: 6.6 I: 6.1 40.4

S: 5.6 T: 5.6 L: 5.6 N: 5.2 U: 4.4 + 26.4 = 66.8

Y: 3.6 C: 3.6 D: 3.3 H: 3.1 M: 3.1 + 16.7 = 83.5

P: 3.0 B: 2.7 G: 2.6 K: 2.1 W: 1.6 12.0

F: 1.6 V: 1.1 Z: 0.6 X,J: 0.4 Q: 0.2 4.3

Here E has fallen from the #1 spot. However, with the exception of Y climbing from 19^th to 11^th and P dropping from 13^th to 16^th it doesn’t seriously change the rankings, so I am not going to change my blog post due to this late breaking information.

The next thing to decide about WORDLE is what is your definition of success. I think of the game as being like a par-5 in golf. To take the analogy to a ridiculous extreme you can think of the game as par-5 in a tournament which uses the modified Stableford scoring system (like the Barracuda Open played at a course next to Lake Tahoe). Double bogey or worse (= not solving the puzzle) is -3, bogey (six guesses) -1, par (five) 0, birdie (four) 2, eagle (three) 5, and double eagle (two) 8 points.

I am not one who is good at brilliant guesses, so my personal metric is to maximize the probability of solving the puzzle. Hence I follow the approach of Zach Johnson who won the 2007 Masters by “laying up” on each par five. Most of these holes are reachable in two (for the pros) but 13 and 15 have water nearby so trying to hit the green in two and putting your ball in th water can lead to a bogey or worse. Zach hit his second shots to within 80-100 yards of the green so he could use his wedge to hit the ball close and make old school birdie.

My implementation of his strategy is to start with TRAIL, NODES, and CHUMP which covers all five traditional vowels and has 15 most frequent letters. The expected number of letters in the word this uncovers is (to use the five letter word frequencies) is 0.835 x 5 = 4.175 if all five letters in the word are different. (Recall from elementary probability that if X_i is the indicator of the event that the letter appear among the first 15 in frequency then E(X₁ + … + X₅) = 5EX₁ Dividing by 5 shows that the expected number of letters in the right position is 0.835 (assuming again all letters are different), so on the average we expect a green and three yellos..

Of course the answer can have repeated letters and can be chosen by the puzzle creator to be unusual, e.g., EPOXY or FORAY which were recent answers. (It is now April 8). In several cases my first three guesses have produced only 2 letters in the word, which makes the birdie putt very difficult. Even when one has four letters, as in _OUND, possibilities are bound, found, mound, pound, round, sound, wound, even though some of these are eliminated if they are in the first 15 guessed.

If there are three (or more) possibilities for the one unknown letter, then it can be sensible to use a turn to see which of these are possible in order to get the answer in two more guesses rather than three. Or you can be like Tiger one year at Augusta and “go for it all.” give your birdie putt on the 15^th hole a good hard rap and watch it roll off the green into the creek. Fortunately for him, the rules of golf allowed him to play his next shot from the previous position.

These rules I have described are just to give you a start at finding a better strategy. You should choose your own three words not only to feel good about having done it yourself, but because the order of the letters can influence the probability of success. Of course you can also choose only to guess two (or only one) and then make your guess based on the result. When I get several letters on the first two guesses, I have often substituted another word for CHUMP to get to the solution faster but I have often regretted that. On the otherhand sometimes when I play CHUMP I am disappointed to get no new positive information about what is in the word