Remembering Krishna Athreya

 

As I write this, he passed away about a week ago. While the end of a person’s life is a sad time, it also provides an opportunity to reflect on the past. The first memory that came to my mind was of sitting in Sid Resnick’s office in the Stanford Statistics department in 1974 (or 1975.) A group of us got together once a week to make our way through the new book on Branching Processes by Athreya and Peter Ney.  It was a welcome modernization of Harris’ 1963 book that started the subject. There were crisp proofs of the basic facts about Galton-Watson processes, Markov branching processes (with exponential lifetimes), the age-dependent case (general lifetimes), and multi-type branching processes. Over my career I have loaned this book to my graduate students many times to help them learn the subject. Remarkably I still have it but it is a little worse for wear.

#2 on my list of Athreya’s greatest hits is A new approach to the limit theory of recurrent Markov chains which appeared in the Transactions of the American Math Society, a paper that was written with Peter Ney in 1978. Again this is a contribution to an area founded by Ted Harris. While Markov chains on discrete state space are well understood, on a general state space numerous pathologies arise. Harris’ genius was to identify a class of these chains that have a tractable theory and cover a number of examples.

There is an elegant analytical theory described in the book by Revuz. However in 1978 several researchers, including Esa Nummelin who later developed a book based on this approach, had the same idea at the same time. I remember attending a session of talks at the 1978 meeting on Stochastic Processes and their Applications and hearing three talks on the topic. This was devasting for a Ph.D. student in the audience who was working on this for his thesis.

The idea is simple but brilliant: a Harris chain can be modified to have one state that is hit with positive probability starting from any state and having one such state is enough to carry out all the usual theory for the discrete case. Given this hint I am sure you can work out the details for yourself. I was so excited by the idea that I put it in the Markov chains chapter of my graduate text book.

Returning to a more traditional narrative: Krishna Athreya received his Ph.D. in 1967 from Stanford where he worked with Sam Karlin, a legendary probabilist with an impressive pedigree: son of Bochner, who was a grandson of Hilbert, and the mentor for 44 students including Tom Liggett and Charles Stone among many others. Athreya’s thesis topic was Multitype Continuous Time Markov Branching Processes and Some Classical Urn Schemes. Soon after he got his degree Athreya and Karin worked on Branching processes in random environments. Two papers were published in Annals of Mathematical Statistics in 1971 since the Annals of Probability which began in 1973 did not yet exist.

These two papers like many in Athreya’s top 20 most cited on MathSciNet contain a number of ideas that have not been fully explored. An example is the work with his Ph.D. student Jack Dai on random logistic maps. Last but not least, I would like to mention his 1994 paper on large deviations for branching processes which contains material that working probabilists should know. Athreya has left an impressive mathematical legacy that will enrich your life and research if you have the time to read it. It is sad that there will be no more work coming from him, but I hope others who read this will be inspired to continue his work. .

The Tea-Cup Problem

Here’s a little problem to test your skills at combinatorial probability.

You have a set of six cups and saucers. Two are NC State red R, two are UNC light blue b, and two are Duke dark blue B. You place the saucers in a line on the table RRbbBB. Then a blind man comes in and puts the saucers on the cups in random order. Let M be the number of cups that match the color of the saucer they are on. Your job is to compute the distribution of M.

To get you started I will specify a probability space which is the first step in solving any problem of this type. I once thought it was good to number the cups but a student in my class this year taught me it was better to treat the two cups of a given color as indistinguishable so we have 6!/(2!2!2!) = 90 outcomes instead of 720. To help check the solution note that not only should the probabilities sum to 1, but we must have EM = 6(1/3)=2. In the next paragraph I will start to reveal the solution starting at 6 and working down, so if you want to discover it on your on you should stop scrolling.

P(M=6)=1/90. In our probability space there is only one outcome where the cups all match, which is better than the situation when the cups are numbered and there are 2 x 2 x 2.

P(M=5)=0. If say the 2R match and 2b match then the 2B must match so 5 is impossible

P(M=4)=12/90. Matching 2-2-0 is impossible by the reasoning for 5, so we must have 2-1-1. There are 3 ways to pick the color with two matches, and for each color with only one match 2 choices of where the matching cup is. The rest of the outcome is now forced, e.g., RRbBBb.

P(M=3)=16/90. Matching 2-1-0 is impossible so we must have 1-1-1. We can pick the locations of the matching cups in 2 x 2 x 2 ways. The other three nonmatching cups must be either BRb or bBR

P(M=2)=27/90. We can have 2-0-0. Once we pick the double matching color in 3 ways the rest is forced, e.g. RRBBbb. We can have 1-1-0. We can pick the color with no match in 3 ways and 2 x 2 ways for the location of the matching cups. Suppose dark blue has no match. Then the two B cups must be on R and b, but there are 2 ways to put R and b on the B sauces for a total of 24 + 3 = 27.

P(M=1) =24/90. We can pick the location of the matching cup in 6 ways. Suppose it is the first R saucer. The second Red saucer can be B or b (2 ways). If it is B then we have bb on the Blue saucers, and we can have RB or BR on the b sauces (x2). If it is B then we have BB on the blue saucers and we have two possibilities on the B saucers, but autocorrect in Word will not let me type them.

P(M=0) =10/90. We can have BB on the Red saucers and then must have RR on b and bb on B. The situation is similar for bb on Red. This gives 2 outcomes. If we have {Bb} on the red saucers then we must have {Rb} on b and {Rb} on B where the set braces indicate we have not specified the order, so there are 2 x 2 x 2 = 8 outcomes.

1+12+16+27+24+10=90, 6 x 1+12 x 4+3 x16 +2 x 27+1 x 24 = 180 (so the mean is 2).

A simple recipe for Chole

This the first of a series of posts in the Category: Cooking: QED, which stands for quick, easy and delicious. The last word may be a bit of a stretch but dumb or dull does not seem to set the right tone. The recipeI am about to share has a long history with my family. Soon after David was born, Susan went to a play group at Cornell for mothers of young children. There she met Smita Chandra, who was a nanny taking care of another family’s child. Despite the difference in “status” they became good friends. Susan spent many hours watching her develop the recipes for From Bengal to Punjab: The Cuisines of India by Smita Chandra, published in Oct 1, 1991. This and Smita’s two other books can be found on Amazon.

The cookbook was the source of my recipe for Chole. This dish is known as Channah Masala at the Indian restaurant Tandoor in the food court at West Campus Union on the Duke campus. Like covid the recipe has been radically changed by a series of mutations. This and all the other recipes in my blog are designed for two people.

Step 1. Open 2 cans (16 oz or 14.5 oz or whatever they are these days) garbanzo beans. Drain off the liquid, add ½ cup water, and cook 10 minutes on 70% in microwave. This method relieves the boredom of heating them in a sauce pan and allows for parallel processing

Step 2. Cut 1 medium onion and 1 medium tomato into small pieces. In a 3 or 4 quart pan  cook onion (once you have done chopping it), and add then tomato.

Step 3. By now the beans should be done. Drain off about half the water, add to pan, and stir to mix up the ingredients. Then add: 1t cumin, ½ t coriander, ½ t turmeric, ¼ t cayenne, 1 t garam masala, 1T lemon juice. T is not a typo it is Tablespoon versus teaspoon. Of course I don’t actually measure these things, just dump what looks like the right amount on top of the beans, and then stir to mix them up.

Step 4. Cook 5 minutes and let it sit on the still warm burner for a few minutes. Divide into four approximately 8 ounce servings. Keep one for tonight’s dinner and put the other three in the freezer (the appliance in the basement that is dedicated to this purpose, not the one that is part of your refrigerator)

To go along with the chole, get one pound of chicken tenders. Divide them into two batches and freeze one. Cut the chicken tenders into pieces that are about 1 inch long (or whatever size that looks right to you). Saute them in a small amount of olive oil infrying pan until they are done, and then cover with an appropriate amount of Tikka Masala Sauce, an continue heating until the sauce is warm..

Samosas (an Indian pasty with potatoes and peas) are the third part of the dinner. The ones I use come frozen and you cook them in a 375 oven for 15 minutes. Which of course means the first step in preparing dinner is to preheat the oven. We use the ones made by Sukhi Singh (www.sukhis.com). Before the pandemic there were 10 in a box but now there are only 8. Sukhi confidently says ”There are two types of people: people who love Indian cuisine, and those who just haven’t tried it yet.”

I wish I had the courage to say: ”There are two types of people: people who love probability, and those who haven’t read my books yet.” But I don’t want to follow in the footsteps of the Duke undergrad who plagiarized her commencement speech almost word for word from one that was given at Harvard a few years ago. I follow the rule: if you copy from one book it is plagiarism, if you copy form 10 it is scholarship. Of course you should change the numbers or the notation and introduce your own typos.

Fear, Loathing, and Surprise at the Kentucky Derby

NBC coverage begins today at 2:30PM with the race slated for 6:57PM. Last year Medina Spirit made 1.86 million dollars for a two minute race, eclipsing what Stormy Daniels was paid for what was presumably a somewhat longer ride on Donald Trump.. The win was negated by the drug test Medina Spirit failed after the derby. Just as abruptly as the horse had reached the top of the sport, the feisty colt collapsed during a workout at Santa Anita Park in Arcadia, Calif.

This type of Shakespearean drama is rare at the Derby. The Kentucky Derby is Decadent and Depraved, Hunter Thompson wrote in a June 1970 artcle. This year’s spectacle featured a limited number of $1000 mint juleps in a signature that sold out well before race time. To be drunk no doubt by women in $10,000 hats saying “if the peasants have no food let them eat cake.” I can’t match Hunter’s style so I’ll leave you to read his article.

http://grantland.com/features/looking-back-hunter-s-thompson-classic-story-kentucky-derby/

The article is long but you have almost four and a half-hour to kill before the race. According to Wikipedia Hunter rose to prominence with the publication in 1967 of Hell’s Angels, a book he wrote while spending a year riding with motorcycle gang. The article on the derby is next in the narrative followed by his book Fear and Loathing in Las Vegas. I read the book as an undergrad. Based on what occurred in the book I am surprised he made it to age 68. The book is a surreal descent into drug abuse. Read the book, don’t see the 1998 movie starring Johnny Depp. It is almost as dreadful as the made for TV trial co-starring Amber Heard, a film noir version of the old show Lifestyles of the Rich and Famous.

* * * * *

Saturday night after watching most of the news on CBS I switched over to watch the running of the Derby. Running a horse in the Derby is the dream of everyone who races horses. I remember my dentist in Ithaca have a horse in the race one year. I fell 50 yards out of the gate, broke its leg, and had to euthanized.

The owners of Rich Strike had a much happier experience. The colt wasn’t even in the field until Friday, when he drew into the race after another horse was scratched. Wearing #21 it started in the 20th chute far away from the rail, he carried 80:1 odds but he came from behind to pull off one of the biggest shocks in Derby history.

A poetic writer in the New York Times seemed to follow Moses’ path through the Red Sea to a three-quarter length victory. In more prosaic terms his first step to victory was to get from the extreme edge to the middle of the pack. Then at about the ¾ mark in the race he moved through the pack to a commanding lead. However in the modern era I don’t need words you can see it for yourself

https://www.youtube.com/watch?v=DFb2XSDv6vE

The horse moved so fast and was so agitated after the finish, trying to bite the horse of the rider who was trying to guide him to the winner’s area, I thought for a moment that this would be a situation where the horse got his speed from a syringe but there hasn’t been anything on the news so I assume that this time the horse passed his drug test.

The owner who bought the horse for $30,000 was charming in his excitement: “What planet is this?” Dawson said. “I feel like I have been propelled somewhere. I’m not sure. This is unbelievable. I asked my trainer up on the stage, I said, ‘Are you sure this is not a dream? Because it can’t be true.’ He assured me this is real. I said OK.”

So there can be feel good stories at the Derby and not only for the owners. Rich Strike paid $163.60 to win on a $2 bet. The 21-3 exacta paid $4101.20 on a $2 bet; the 21-3-10 Trifecta $14,870.70 on a $1 bet, and the Superfecta 21-3-10-13 $321,500.10 on a 1 bet so even  if you would have bet on all P(20,4) = 116,280 possibilities you would have won big.

 

WORDLE for TYROS

Tyro is a bit of crosswordese that means beginner or novice. Writing this reminds me of my first WORDLE in which I failed to guess TACIT in six tries. A tweet related to this puzzle which found its way into Rex Parker’s NYTimes Xword blog said something like the following: The answer reminds me of why I don’t do crosswords they are done by old people writing old words into the grid.

Turning to the main subject, as most of you probably know in WORDLE you get six tries to guess a five-letter word. On each turn you guess a five-letter word, a rule which prevents you from guessing say AEIOU to find out what vowels are present. If a letter is in the correct location  it shows green. If it is in the puzzle but not in the right place then it is white. If it is not in the answer it is gray. (Colors may vary) A copy of a computer key board on the screen allows you to enter you guesses and shows the status of each letter you have guessed.

As I start to give my advice I must admit I am still a novice but that never stopped TRUMP from pontificating on how to be president. In thinking about how to play WORDLE it is useful to know how frequently letters are used in the English language.

When Samuel Morse wanted to figure this out in the 1800s, he looked at the frequency of letters in sets of printers type which he found to be (numbers in thousands) E (12), T (9), A, E, I, O, S (8), H (6.4), R (6.2), D(4.4), L (4), U (3.4), C,M (3), etc. With computers and electronic dictionaries at our disposal we have a more precise idea (numbers are percentages).

E: 11.16                             A: 8.50                R: 7.58                I: 7.55                  O: 7.16                     41.95

T: 6.95                 N: 6.65                S: 5.74                L: 5.49                 C: 4.54                + 29.73 = 71.68

U: 3.63                D: 3.38                P: 3.17                 M: 3.01               H: 3.00                + 16.19 = 87.87

G: 2.47                B: 2.07                F: 1.81                 Y: 1.78                 W: 1.29               9.42

K: 1.102              V: 1.007              X: 0.290              Z: 0.272               J,Q: 0.196              2.93

Here the numbers in the last column are the sum of the numbers on the row and we have made 26 divisible by 5 by putting J and Q which have the same frequency to 3 significant figures into the same entry. This table become somewhat irrelevant once you visit

https://leancrew.com/all-this/2022/01/wordle-letters/

to find the letter frequencies in five letter words.

A: 10.5                E: 10.0                 R: 7.2                   O: 6.6                  I: 6.1                    40.4

S: 5.6                   T: 5.6                   L: 5.6                   N: 5.2                  U: 4.4                  + 26.4   = 66.8

Y: 3.6                   C: 3.6                   D: 3.3                  H: 3.1                  M: 3.1                 + 16.7   = 83.5

P: 3.0                   B: 2.7                   G: 2.6                  K: 2.1                   W: 1.6                 12.0

F: 1.6                   V: 1.1                   Z: 0.6                   X,J: 0.4                Q: 0.2                  4.3

Here E has fallen from the #1 spot. However, with the exception of Y climbing from 19th to 11th and P dropping from 13th to 16th it doesn’t seriously change the rankings, so I am not going to change my blog post due to this late breaking information.

The next thing to decide about WORDLE is what is your definition of success. I think of the game as being like a par-5 in golf. To take the analogy to a ridiculous extreme you can think of the game as par-5 in a tournament which uses the modified Stableford scoring system (like the Barracuda Open played at a course next to Lake Tahoe). Double bogey or worse (= not solving the puzzle) is -3, bogey (six guesses) -1, par (five) 0, birdie (four) 2, eagle (three) 5, and double eagle (two) 8 points.

I am not one who is good at brilliant guesses, so my personal metric is to maximize the probability of solving the puzzle. Hence I follow the approach of Zach Johnson who won the 2007 Masters by “laying up” on each par five. Most of these holes are reachable in two (for the pros) but 13 and 15 have water nearby so trying to hit the green in two and putting your ball in th water can lead to a bogey or worse. Zach hit his second shots to within 80-100 yards of the green so he could use his wedge to hit the ball close and make old school birdie.

My implementation of his strategy is to start with TRAIL, NODES, and CHUMP which covers all five traditional vowels and has 15 most frequent letters. The expected number of letters in the word this uncovers is (to use the five letter word frequencies) is  0.835 x 5 =  4.175 if all five letters in the word are different. (Recall from elementary probability that if Xi is the indicator of the event that the letter appear among the first 15 in frequency then E(X1 + … + X5) = 5EX1  Dividing by 5 shows that the expected number of letters in the right position is 0.835 (assuming again all letters are different), so on the average we expect a green and three yellos..

Of course the answer can have repeated letters and can be chosen by the puzzle creator to be unusual, e.g., EPOXY or FORAY which were recent answers. (It is now April 8). In several cases my first three guesses have produced only 2 letters in the word, which makes the birdie putt very difficult. Even when one has four letters, as in  _OUND, possibilities are bound, found, mound, pound, round, sound, wound, even though some of these are eliminated if they are in the first 15 guessed.

If there are three (or more) possibilities for the one unknown letter, then it can be sensible to use a turn to see which of these are possible in order to get the answer in two more guesses rather than three. Or you can be like Tiger one year at Augusta and “go for it all.” give your birdie putt on the 15th hole a good hard rap and watch it roll off the green into the creek. Fortunately for him, the rules of golf allowed him to play his next shot from the previous position.

These rules I have described are just to give you a start at finding a better strategy. You should choose your own three words not only to feel good about having done it yourself, but because the order of the letters can influence the probability of success. Of course you can also choose only to guess two (or only one) and then make your guess based on the result.  When I get several letters on the first two guesses, I have often substituted another word for CHUMP to get to the solution faster but I have often regretted that. On the otherhand sometimes when I play CHUMP I am disappointed to get no new positive information about what is in the word

Bracketology 2022: There is no such thing as probability.

My first job was at UCLA in 1976. The legendary Ted Harris who wrote one of the first books on branching processes, found a tractable subset of Markov chains on general state space that bears his name, and invented the contact process was there at University of Southern California. USC was only about 20 miles away and Ted Cox was there from 1977-1979, so I would often go over there on Friday afternoon for the probability seminar. On weeks when there was an outside speaker, Ted and his wife Connie would have little after-dinner desert parties in their house at the southeastern edge of Beverly Hills. One of Connie’s favorite things to say was, you guessed it, “There is no such thing as probability.” To support this claim she would tell stories in which something seemingly impossible happened. For example, one evening after eating at a restaurant, she realized while walking to the exit that she had left her purse at the table. She went back to the table to retrieve it and along the way saw an old friend that she had not seen in many years. She would never had seen the friend unless she had forgotten her purse. The punch line of the story was “What is the probability of that?”

The connection with basketball is that this year’s March Madness seems to violate some of the usual assumptions of probability.

The probability of events such as black coming up on a roulette wheel do not change in time. Of course, the probability a team wins a game depends on their opponent, but we don’t expect that the characteristics of the team will change over time. This is false for this year’s Duke Blue Devils. They lost to UNC in coach K’s retirement game on March 5, and sleepwalked their way through the ACC tournament needs late game surges to beat Syracuse and Miami, before losing to Virginia Tech in the finals.

They won their first game in the NCAA against Cal State Fullerton. This was a boring game in which the difference in scores being like the winnings of  a player betting on black every time. Playing against Michigan State, it looked like it was all over when Duke was down by 5 with a minute to play but they rallied to win. In the next game against Texas Tech in a late game time out, the players convinced coach K to let them switch from zone defense to man-to-man. If I have the story right, at that moment coach K slapped the floor, and then the five players all did so simultaneously, an event of intense cosmic significance, and Texas Tech was done for. Maybe the French theory of grossissement de filtration can take account of this, but I am not an expert on that.

You have to take account of large deviations. In the first round #15 seed St. Peter’s stunned the nation with an upset victory over #2 Kentucky, and then beat #7 Murray State to reach the Sweet 16 where they played Purdue. St. Peters plays with four guards and early in the game substituted five new players for the starting five. The four guards buzzed around the court annoyed the players that were bringing the ball up the floor and generally disrupted Purdue’s game. To the 7’4” center they were probably like the buzzing of bees that he could hear but not see since they were so far below him.

Basketball is a great example of Lévy’s 0-1 law: the probability of a win, an event we’ll call W, given the current information about the game (encoded in a sigma field Ft) converges to 0 or 1 as t tends to ꝏ (which is usually 40 minutes but might include over time). Late in the game this quantity can undergo big jumps. Purdue was down by 6 with about a half-minute to play and desperately needed a three point shot. The player with the ball turned to throw it to a player who he thought would be nearby and open, only to find that the player had decided to go somewhere else, and suddnely the probability dropped much closer to 0

Games are not independent. Of course, the probability a team wins a game depends on their opponent, but even if you condition on the current teams, the tournament does not have the Markov property. On Thursday March 24 Arkansas upset #1 seed Gonzaga. After this emotional win and with little time to prepare they played Duke on Saturday and slowly succumbed. In a display of coach K’s brilliance “Duke won the final minute of the first half” increasing their lead from 7 points to 12. Even though the game was a martingale after the end of the first half, the L2 maximal inequality guaranteed than a win was likely.

The high point of the Duke-Kansas game came about two minutes into the second half when an Arkansas three point shot bounced off the rim and ended resting on top of the backboard. A quick thinking (and very pretty) Arkansas cheerleader got up on the shoulder of her male partner for their gymnastic routines. Putting her two little pom-poms in one hand she reached up and tipped the ball back down to the floor.

To end where I began What is the probability of that? To use the frequentist approach we can count the number of games in the regular season and in the tournament in which this event occurred and divide by the number of games. The answer would be similar the Bayesian calculation that the sun will rise tomorrow. Using Google and using the curious biblical assumption that the solar system is 5000 years old gives (as I read on the internet) that the probability that the sun will not rise today is 1/186,215. Even though it may seem to my wife that there are this many games, I have to agree with Connie Harris: in March Madness there is no such thing as probability.

Is bold play optimal in football?

It has been 18 months since my last blog post. At that point I was very angry about Trump’s mishandling of the covid pandemic and the fact that people wouldn’t wear masks, while on other days I was saying goodbye to two former colleagues who were mentors, colleagues, and good friends. Not much has changed: now I am angry about people who won’t get vaccinated and I spend my time sticking pins into my Aaron Rodgers voodoo doll hoping that a covid outbreak on his team will keep him from winning the Super Bowl.

To calm myself I have decided to do some math and relax. It is a well-known result (but not easy for an impatient person to find on the internet) that if you are playing a game that is biased against, bold play is optimal. Specifically, if  you want to reach a fixed target amount of money when playing such a game then the optimal strategy is to bet all the money you have until you reach the point where winning will take you beyond your goal and then bet only enough to reach your goal.

For a concrete example, suppose (i) you have $1 and your wife wants you to take her from brunch at the Cheesecake Factory which will cost you $64, and (ii) you want to win the necessary amount of money by betting on black at roulette where you win $1 with probability 18/38 and lose $1 with probability 20/38. A standard calculation, which I’ll omit since it is not very easy to type in Mircosoft Word, (see Example 1.45 in my book Essentials of Stochastic Processes) shows that the probability I will succeed is 1.3116 x 10 -4. In contrast the strategy of starting with $1 and “letting it ride” with the hope that you can win six times in a row has probability (18/38)6 = 0.01130. This 86 times as large as the previous answer, but still a very small probability.

Consider now the NFL football game between the Green Bay Packers and the Baltimore Ravens held on Sunday December 19, 2021. After trailing 31-17 the Ravens scored two touchdowns to bring the score to 31-30. To try to win the game without going to overtime they went for a two-point conversion, failed and lost the game. Consulting google I find that surprisingly 49.4% of two point conversions are successful versus 94.1% of kicks for one point. In this game under consideration the two point conversion would not necessarily win the game, since there were about 45 seconds left on the clock with Green Bay having one time out, so there is some chance (say 30%) that Green Bay could use passes completed near the sideline to get within range to make a field goal and win the game 34-32. Rounding 49.4 to 50, going for two results in a win probability for the Ravens of 35%. With a one-point conversion their win probability is 0.94 x 0.7 x p, where p is the probability of winning in overtime. If p = ½ this is 33%. However, if the 8-6 Ravens feel like the 11-3 Packers have a probability significantly bigger than ½ to win in overtime then the right decision was to go for two points.

A second example is provided by the Music City Bowl game between Tennessee and Purdue, held December 30, 2021. After a fourth quarter that saw each team score two touchdowns in a short amount of time (including a two-point conversion by Purdue to tie the score), each had 45 points. The pre-overtime coin flip determined that Tennessee would try to score first (starting as usual from the opponent’s 25 yard line). Skipping over the nail biting excitement that makes football fun to watch, we fast-forward to Tennessee with fourth down and goal on the 1 yard line. The timid thing to do would be to kick the field goal which has a probability that is essentially 1. In this case if Purdue

(i) scores a touchdown (with probability p7) Tennessee (or T for short) loses

(ii) kicks a field goal (with probability p3), they go to a second overtime period

(iii) does not score (with probability p0) T wins

Using symmetry the probability T wins is p0 + p3/2 = 1 – p7 – p3/2

Case 1. If T fails to score (which is what happened in the game) then the Purdue will win   with high probability, since they only need a field goal. In the actual game, three running plays brought the ball 8 yards closer and then the kicker made a fairly routine field goal.

Case 2. If T scores (with probability q) then Purdue must score a touchdown, an event of probability P7 > p7 so the probability T wins when they try to score a touchdown is q[(1-P7) +P7/2]

There are a few too many unknowns here, but if we equate p7 with scoring a touchdown when the team is in the red zone (inside the 20) then the top 10 ranked teams all have probabilities of > 0.8. If we take q=0.5, set p7=0.8 then the probability T wins in Case 2 is 0.3 versus 0.2 – p3/2 in Case 1 which is 0.15 if p3=0.1 (half the time they don’t score a touchdown they get a field goal.

Admittedly like a student on exam with a question they don’t know the answer to, I have laid down a blizzard of equations in hopes of a better score on the problem. But in simple terms since p7 is close to 1, the Tennessee coach could just skip all the math and assume Purdue would go on to score a touchdown on their possession and realize he needs his team to score a touchdown, which regrettably they did not.

Like many posts I have written, the story ended differently than I initially thought but you have to follow the math where it takes you.

A National Face Mask Law Could End the Pandemic

How do I know this? Because I read an article in the April 2020 issue of the Atlantic Monthly explained the “real reason to wear a mask.”

https://www.theatlantic.com/health/archive/2020/04/dont-wear-mask-yourself/610336/

Medical workers use them and other PPE to avoid ingress, transmission of outside particle to the wearer. However, individuals should wear masks to prevent egress. A key transmission route of COVID-19 is via droplets that fly out of our mouths when we cough, sneeze or even just speak. The purpose of wearing a mask is to avoid you transmitting the virus to others around us.

To develop this article, the magazine assembled an interdisciplinary team of 19 experts and looked at a range of mathematical models and other research. They wrote a scientific paper that was published online

https://www.preprints.org/manuscript/202004.0203/v1

The conclusion was that if 80% of people wore masks that were 60% efficient (easily achievable with cloth masks) the basic reproduction number R0 for the epidemic would be < 1 and the epidemic would die out. A graphic shows that possible combinations of mask wearing percentages and mask efficiencies that would achieve this goal.

I admit that the time scale over which things will happen is somewhat of a guess. Not much reduction will be seen in the first week since many infected people have yet to show symptoms. As the graphic shows the reduction will depend on the percent of people complying with the order and the quality of face masks, which should be much better now than when the article was initially published. On the other hand large numbers of people congregating in bars without wearing face masks could negate the effort.

The effectiveness of masks in containing the virus is not just a theoretical result. There are a number of spectacular examples of success. In Hong Kong only four deaths due to COVID-19 have been recorded since the beginning of the pandemic. Hong Kong health authorities credit their citizens’ near universal mask-wearing as a key factor. Similarly, Taiwan ramped up mask production early on and distributed masks to the population, mandating their use in public transit and recommending their use in public places, a suggestion that was been widely complied with. Their death toll has been 6, and the schools have been open since early February.

While other countries have been smart, the US has not. Thanks in no small part to Trump’s decision to not wear a mask and to have large rallies where very few people wore them, the issue has become politicized. Recently the governor of Georgia sued the mayor of Atlanta to stop her from imposing a mask order. Each weekend in Raleigh, hundreds of young people crowd into restaurants and bars on Glendale South and there is not a mask in sight, a situation that occurs in many parts of the country. This behavior occurs because of the perception that young people rarely get sick and if they do get infected the symptoms are mild. However, in recent weeks 1/3 of the new infected have been under the age of 30.

As Dr. Fauci has Said when Trump has allowed him to be on TV, large gatherings in which face masks are not worn can lead to transmission of the virus from one asymptomatic person to another. It is difficult to determine the extent to which this occurs, but contact tracing data from North Carolina shows that 50% of symptomatic cases are caused by contact with an asymptomatic individual. Another sign of the invisible epidemic is that the CDC estimates that there have been 10 times as many cases as those that have been verified by a COVID-19 test.

Trump has recently worn a mask, and at his corona virus briefing on Tuesday July 21, uttered the words that everyone should wear a mask when they are in a situation where social distancing is impossible. The history of pandemic in America shows that people will not voluntarily do the right thing. It must be mandatory. The president could dramatically improve his chances of being re-elected by signing an executive order to make mask mandatory.

I hate to point the president to a road to re-election, but I do not want to see 90,000 more people die. The IHME web site

https://covid19.healthdata.org/united-states-of-america

projects 224,500 deaths by election day, while the CDC data shows that 140,000 have occurred as of July 21. To get re-elected, Trump must first admit stop lying about the pandemic. The US has 5% of the world’s population but the fraction of deaths that have occurred here is 140,000/617,000 = 22.7%, more than 4 times as many as a typical country. It does not have the lowest death rate in the world.

The US cannot reopen its economy or send students back to school five days a week with a pandemic raging in the streets. The crisis needs to be stopped now. It seems unlikely that sttes will go back into lockdown, so making masks mandatory is our only hope. If hospitalizations continue to spiral out of control (and they are NOT caused by our high level of testing) then the death toll could easily go higher than projected. In April when stay at home orders and other control measures were in place, the IHME projected death toll was roughly 70,000. This means that the premature re-opening of the economy has cost 150,000 lives. If we had followed the lead of Europe and dramatically reduce the number of cases before opening up the country things would be much better now but that opportunity is gone. We need to act now to prevent a complete disaster.

 

Pooled Tests for COVID-19

When one is dealing with a disease that is at a low frequency in the population and one has a large number of people to test, it is natural to do group testing. A fixed number of samples, say 10, are mixed together. If the combined sample is negative, we know all the individuals are. But if a group tests positive then all the samples in the group have to be retested individually.

If the groups are too small then not much work is saved. If the groups are too large then there are too many positive group tests. To find the optimal group size, suppose there are a total of  N individuals, the group size is k, and 1% of the population has the disease. The number of group tests that must be performed is N/k. The probability a group tests positive is k/100. If this happens then we need k more tests. Thus we want to minimize

(N/k)( 1 + k2/100) = N/k + Nk/100

Differentiating we want –N/k2 + N/100=0 or k = 10. In the concrete case N = 1000, the number of tests is 200.

Note: the probability a group test is positive is p = 1 – (1 – 1/100)k but this makes the optimization very messy. When k=10, 1 + kp = 1.956, so the answer does not change by very much.

Recent work reported on in Nature on July 10, 2020 shows that the number of tests needed can be reduced substantially if the individuals are divided into groups in two different ways for group testing before one has to begin testing individuals. To visualize the set-up consider a k by k matrix with one individual in each cell. We will group test the rows and group test the columns . An individual who tests negative in either test can be eliminated. The number of k by k squares is N/k2. For each square there are 2k tests that are always performed. Each of the k2 individuals in the square have their group test positive twice with probability (k/100)2. These events are NOT independent, but that does not matter in computing the expected number of tests

(N/ k2)(2k + k4/10,000) = 2N/k + N k2/10,000

Differentiating we want –2N/k2 + 2Nk/10,000 = 0 or k = (10,000)1/3 = 21.54. In the concrete case N=1000 the expected number of tests is 139.

Practical Considerations:

One could do fewer tests by eliminating the negative rows before testing the columns, but the  algorithm used here allows all the tests to be done at once, avoiding the need to wait for the first round results to come back before  the second round is done.

Larger group sizes will make it harder to detect the virus if only one individual in the group. The Nature article, Sigrum Smola of the Saarland University Medical Center in Homburg has been is quoted as saying he doesn’t recommend grouping more than 30 individuals in one test. Others claim that it is possible to identify the virus when there is one positive individual out of 100.

Ignoring the extra work in creating the group samples, the method described above reduces the cost of test by 86%. The price of $9 per test quoted in the article would be reduced to $1.26, so this could save a considerable amount of money for a university that has to test 6000 undergraduates several times in one semester.

In May, officials in Wuhan used a method of this type to test 2.3 million samples in two weeks.

References

Mutesa, L et al (2020) A strategy for finding people infected with SARS-CoV-2: optimizing pooled testing at low prevalence arXiv: 2004.14934

Malliapaty, Smriti (2020) The mathematical strategy that could transform coronavirus testing. Nature News July 10. https://www-nature-com/articles/d41586-020-02053-6

 

China is NOT to blame for the COVID-19 epidemic in the US

As President, Donald has told tens of thousands of lies. In many cases, he can hide behind the silence of his loyal supporters. However, when it comes to the coronavirus epidemic the details are on TV, in the press, and in publicly available databases for all the world to see.

One of his most egregious lies is that China is to blame for the epidemic. A May 20 story in USA today says “As the political rhetoric blaming China for the pandemic escalates, law enforcement officials and human rights advocates have seen an increasing number of hate crimes and incidents of harassment and discrimination against Asian Americans.” Trump has fanned these flames in his rallies, referring to the virus as the “Kung flu.”

One of the most incredible lies (i.e., too extraordinary and improbable to be believed) is that the corona virus was made in a laboratory in Wuhan. To protect this lie, the White House directed the National Institutes of Health to cancel funding for a project studying how coronaviruses spread from bats to people. The NIH typically only cancels active grant when there is scientific misconduct or improper financial behavior, neither of which it has occurred in this case. The PI on the grant, Dr. Peter Daszak, is President of EcoHealth Alliance, a US-based organization that conducts research and outreach programs on global health, conservation and international development. His research has been instrumental in identifying and predicting the origins and impact of emerging disease, which is very important for avoiding future pandemics..

Early Spread.  A special report published on July 5 in the New York Times gives new information about the early days of the epidemic. In mid-February the official case count was 15 but there is evidence of 2000 other infections. Given what we now know about the spread of the disease, it is natural to guess that many of these cases were asymptomatic. However, as explained in the paper cited in the next paragraph, part of the discrepancy was due to the fact that testing done before March 4, 2020 was only done for symptomatic patients who had recently traveled internationally.

This idea that the corona virus was widespread in the US in January 2020 was discussed in news stories about Alessandro Vepignani’s work. These appeared on the Northeastern web site in April, but the paper has only recently appeared on the medRxiv: Jessica T. Davis et al. Estimating the establishment of local transmission and the cryptic phase of the COVID-19 pandemic in the US. Their conclusions are based on the use of a rather complicated individual-based, stochastic and spatial epidemic model called  GLEAM (GLobal Epidemic and Mobility Model) that divides the global population 3200 subpopulations. See PNAS 166 (2009), 21484-21489 and J. Computational Science 1 (2010), 132-145 for more details.

Origins of the virus in the US.  Recently two genetic sequencing studies published online in Science, have investigated the origins of the corona virus in US. A.S. Gonzalez-Reiche et al, published on May 29, 2020 studied Introductions and early spread of SARS-CoV-2 in the New York City area. Phylogenetic analysis of 84 distinct SARS-CoV-2 genomes from samples taken February 29 – March 18 provided evidence for multiple, independent introduction . Phylogenetic analysis of full length genome sequences suggested that the majority of the introductions came from Europe and other parts of the United States.

A medRxiv preprint by M.T. Maurano et al which reports on the analysis of 864 SARS-CoV-2 sequences, reached the same conclusion: comparisons to global viral sequences showed that early strains was most likely linked to cases from Europe

Deng et al, published on June 8, 2020 studied the introduction of Sars-CoV-2 into Northern California. They studied 36 patients spanning 9 counties and the Grand Princess cruise ship using a method they called MSSPE (Metagenomic Sequencing with Spiked Primer Enrichment) to assemble genomes directly from clinical samples. Phylogenetic analysis described in detail in the paper indicated that 14 were associated with the Washington State WA1 lineage, 10 associated with a Santa Clara outbreak cluster (SCC1), 3 from a Solano County cluster, 5 related to lineages circulating in Europe but only 4 related to lineages from Wuhan. This precision comes from the fact that as of March 20, 2020 when this work was done, there were 789 worldwide genomes in the GISAID database. This wealth of data is possible because coronaviruses are unsegmented single-stranded RNA viruses that are about 30 kilobases in length.

The results in the last three paragraphs demonstrate that most lineages came from Europe not China. In hindsight the fact that Europe is the primary source of coronavirus is the US. Travel from China was banned February 2, but travel from Europe was only ended on March 13.

I have concentrated on the science.  I’ll leave it to you to decide if you want the Senate to vote for Thom Tillis’ 18-point plan in May “to hold China accountable” for what he says is its role in the coronavirus pandemic.