Category Archives: Uncategorized

The Tea-Cup Problem

Here’s a little problem to test your skills at combinatorial probability.

You have a set of six cups and saucers. Two are NC State red R, two are UNC light blue b, and two are Duke dark blue B. You place the saucers in a line on the table RRbbBB. Then a blind man comes in and puts the saucers on the cups in random order. Let M be the number of cups that match the color of the saucer they are on. Your job is to compute the distribution of M.

To get you started I will specify a probability space which is the first step in solving any problem of this type. I once thought it was good to number the cups but a student in my class this year taught me it was better to treat the two cups of a given color as indistinguishable so we have 6!/(2!2!2!) = 90 outcomes instead of 720. To help check the solution note that not only should the probabilities sum to 1, but we must have EM = 6(1/3)=2. In the next paragraph I will start to reveal the solution starting at 6 and working down, so if you want to discover it on your on you should stop scrolling.

P(M=6)=1/90. In our probability space there is only one outcome where the cups all match, which is better than the situation when the cups are numbered and there are 2 x 2 x 2.

P(M=5)=0. If say the 2R match and 2b match then the 2B must match so 5 is impossible

P(M=4)=12/90. Matching 2-2-0 is impossible by the reasoning for 5, so we must have 2-1-1. There are 3 ways to pick the color with two matches, and for each color with only one match 2 choices of where the matching cup is. The rest of the outcome is now forced, e.g., RRbBBb.

P(M=3)=16/90. Matching 2-1-0 is impossible so we must have 1-1-1. We can pick the locations of the matching cups in 2 x 2 x 2 ways. The other three nonmatching cups must be either BRb or bBR

P(M=2)=27/90. We can have 2-0-0. Once we pick the double matching color in 3 ways the rest is forced, e.g. RRBBbb. We can have 1-1-0. We can pick the color with no match in 3 ways and 2 x 2 ways for the location of the matching cups. Suppose dark blue has no match. Then the two B cups must be on R and b, but there are 2 ways to put R and b on the B sauces for a total of 24 + 3 = 27.

P(M=1) =24/90. We can pick the location of the matching cup in 6 ways. Suppose it is the first R saucer. The second Red saucer can be B or b (2 ways). If it is B then we have bb on the Blue saucers, and we can have RB or BR on the b sauces (x2). If it is B then we have BB on the blue saucers and we have two possibilities on the B saucers, but autocorrect in Word will not let me type them.

P(M=0) =10/90. We can have BB on the Red saucers and then must have RR on b and bb on B. The situation is similar for bb on Red. This gives 2 outcomes. If we have {Bb} on the red saucers then we must have {Rb} on b and {Rb} on B where the set braces indicate we have not specified the order, so there are 2 x 2 x 2 = 8 outcomes.

1+12+16+27+24+10=90, 6 x 1+12 x 4+3 x16 +2 x 27+1 x 24 = 180 (so the mean is 2).

WORDLE for TYROS

Tyro is a bit of crosswordese that means beginner or novice. Writing this reminds me of my first WORDLE in which I failed to guess TACIT in six tries. A tweet related to this puzzle which found its way into Rex Parker’s NYTimes Xword blog said something like the following: The answer reminds me of why I don’t do crosswords they are done by old people writing old words into the grid.

Turning to the main subject, as most of you probably know in WORDLE you get six tries to guess a five-letter word. On each turn you guess a five-letter word, a rule which prevents you from guessing say AEIOU to find out what vowels are present. If a letter is in the correct location  it shows green. If it is in the puzzle but not in the right place then it is white. If it is not in the answer it is gray. (Colors may vary) A copy of a computer key board on the screen allows you to enter you guesses and shows the status of each letter you have guessed.

As I start to give my advice I must admit I am still a novice but that never stopped TRUMP from pontificating on how to be president. In thinking about how to play WORDLE it is useful to know how frequently letters are used in the English language.

When Samuel Morse wanted to figure this out in the 1800s, he looked at the frequency of letters in sets of printers type which he found to be (numbers in thousands) E (12), T (9), A, E, I, O, S (8), H (6.4), R (6.2), D(4.4), L (4), U (3.4), C,M (3), etc. With computers and electronic dictionaries at our disposal we have a more precise idea (numbers are percentages).

E: 11.16                             A: 8.50                R: 7.58                I: 7.55                  O: 7.16                     41.95

T: 6.95                 N: 6.65                S: 5.74                L: 5.49                 C: 4.54                + 29.73 = 71.68

U: 3.63                D: 3.38                P: 3.17                 M: 3.01               H: 3.00                + 16.19 = 87.87

G: 2.47                B: 2.07                F: 1.81                 Y: 1.78                 W: 1.29               9.42

K: 1.102              V: 1.007              X: 0.290              Z: 0.272               J,Q: 0.196              2.93

Here the numbers in the last column are the sum of the numbers on the row and we have made 26 divisible by 5 by putting J and Q which have the same frequency to 3 significant figures into the same entry. This table become somewhat irrelevant once you visit

https://leancrew.com/all-this/2022/01/wordle-letters/

to find the letter frequencies in five letter words.

A: 10.5                E: 10.0                 R: 7.2                   O: 6.6                  I: 6.1                    40.4

S: 5.6                   T: 5.6                   L: 5.6                   N: 5.2                  U: 4.4                  + 26.4   = 66.8

Y: 3.6                   C: 3.6                   D: 3.3                  H: 3.1                  M: 3.1                 + 16.7   = 83.5

P: 3.0                   B: 2.7                   G: 2.6                  K: 2.1                   W: 1.6                 12.0

F: 1.6                   V: 1.1                   Z: 0.6                   X,J: 0.4                Q: 0.2                  4.3

Here E has fallen from the #1 spot. However, with the exception of Y climbing from 19th to 11th and P dropping from 13th to 16th it doesn’t seriously change the rankings, so I am not going to change my blog post due to this late breaking information.

The next thing to decide about WORDLE is what is your definition of success. I think of the game as being like a par-5 in golf. To take the analogy to a ridiculous extreme you can think of the game as par-5 in a tournament which uses the modified Stableford scoring system (like the Barracuda Open played at a course next to Lake Tahoe). Double bogey or worse (= not solving the puzzle) is -3, bogey (six guesses) -1, par (five) 0, birdie (four) 2, eagle (three) 5, and double eagle (two) 8 points.

I am not one who is good at brilliant guesses, so my personal metric is to maximize the probability of solving the puzzle. Hence I follow the approach of Zach Johnson who won the 2007 Masters by “laying up” on each par five. Most of these holes are reachable in two (for the pros) but 13 and 15 have water nearby so trying to hit the green in two and putting your ball in th water can lead to a bogey or worse. Zach hit his second shots to within 80-100 yards of the green so he could use his wedge to hit the ball close and make old school birdie.

My implementation of his strategy is to start with TRAIL, NODES, and CHUMP which covers all five traditional vowels and has 15 most frequent letters. The expected number of letters in the word this uncovers is (to use the five letter word frequencies) is  0.835 x 5 =  4.175 if all five letters in the word are different. (Recall from elementary probability that if Xi is the indicator of the event that the letter appear among the first 15 in frequency then E(X1 + … + X5) = 5EX1  Dividing by 5 shows that the expected number of letters in the right position is 0.835 (assuming again all letters are different), so on the average we expect a green and three yellos..

Of course the answer can have repeated letters and can be chosen by the puzzle creator to be unusual, e.g., EPOXY or FORAY which were recent answers. (It is now April 8). In several cases my first three guesses have produced only 2 letters in the word, which makes the birdie putt very difficult. Even when one has four letters, as in  _OUND, possibilities are bound, found, mound, pound, round, sound, wound, even though some of these are eliminated if they are in the first 15 guessed.

If there are three (or more) possibilities for the one unknown letter, then it can be sensible to use a turn to see which of these are possible in order to get the answer in two more guesses rather than three. Or you can be like Tiger one year at Augusta and “go for it all.” give your birdie putt on the 15th hole a good hard rap and watch it roll off the green into the creek. Fortunately for him, the rules of golf allowed him to play his next shot from the previous position.

These rules I have described are just to give you a start at finding a better strategy. You should choose your own three words not only to feel good about having done it yourself, but because the order of the letters can influence the probability of success. Of course you can also choose only to guess two (or only one) and then make your guess based on the result.  When I get several letters on the first two guesses, I have often substituted another word for CHUMP to get to the solution faster but I have often regretted that. On the otherhand sometimes when I play CHUMP I am disappointed to get no new positive information about what is in the word

Duke grads vote on Union

According to the official press release: “Of the 1,089 ballots cast, 691 voted against representation (“NO”) by SEIU and 398 for representation by SEIU (“YES”). There were, however, 502 ballots challenged based on issues of voter eligibility. Because the number of challenged ballots is greater than the spread between the “YES” and “NO” votes, the challenges could determine the outcome and will be subject to post-election procedures of the NLRB.”

The obvious question is what is the probability this would change the outcome of the election? If the NO’s lose 397 votes and hence the YES lose 015 on the recount the outcome will be 294 NO, 293 YES. A fraction 0.6345 of the votes were NO. We should treat this as an urn problem but to get a quick answer you can suppose the YES votes lost are Binomial(502,0.3655). In the old days I would have to trot out Stirling’s formula and compute for an hour to get the answer but now all I have to do is type into my vintage TI-83 calculator

Binompdf(502,0.3655,105) = 2.40115  X 10-14

i.e., this is the probability of fewer than YES votes lost.

Regular reader of this blog will remember that I made a similar calculation to show that there was a very small probability that the 62,500 provisional ballots would change the outcome of the North Carolina election since before they were counted Cooper had a 4772 vote lead over McCrory. If we flip 62,500 coins then the standard deviation of the change in the number of votes is

{62,500(1/4) 1 / 2 = 125

So McCrory would need 33,636 votes = 2386 above the mean = 19.08 standard deviations. However, as later results showed this reasoning was flawed: Cooper’s lead to a more than 10,000 votes. This is due to the fact that, as I learned later, provisional ballot have a greater tendency to be Democratic while absentee ballots tend to be Republican.

Is this all just #fakeprobability? Let’s turn to a court case de Martini versus Power. In a close electionin a small town, 2,656 people voted for candidate A compared to 2,594 who voted for candidate B, a margin of victory of 62 votes. An investigation of the election found that 136 of the people who voted in the election should not have. Since this is more than the margin of victory, should the election results be thrown out even though there was no evidence of fraud on the part of the winner’s supporters?

In my wonderful book Elementary Probability for Applications, this problem is analyzed from the urn point of view. Since I was much younger when I wrote the first version of its predecessor in 1993, I wrote a program to add up the probabilities and got 7.492 x 10 -8. That computation supported the Court of Appeals decision to overturn a lower court ruling that voided the election in this case.If you want to read the decision you can find it at

http://law.justia.com/cases/new-york/court-of-appeals/1970/27-n-y-2d-149-0.html