Home » Basic probability (Page 6)
Category Archives: Basic probability
Variance of binomial vs. hypergeometric
Given \(N\) balls of which \(r\) of them are red and the rest are green. Denote \(X\) as the number of red balls drawn when sampling with replacement and \(Y\) as the number of red balls drawn when sampling without replacement.
- What is the difference between the variance of \(X\) and the variance of \(Y\) ?
- For what values of \(N,n,r\) is the variance the largest ?
Expectation of geometric distribution
Compute the expectation of the geometric distribution using the fact that in this case
\(\mathbf{E}(X)= \sum_{k=1}^{\infty} \mathbf{Pr}(X\geq k) \)
Expectations of die rolls
A fair die is rolled ten times. Find numerical values for the expectations of each of the following random variables
- the sum of the numbers in the ten rolls;
- the sum of the largest two numbers in the first three rolls;
- the maximum number in the first five rolls;
- the number of multiples of three in the first ten rolls;
- the number of faces which fail to appear in the ten rolls;
- the number of different faces that appear in the ten rolls.
[From Pitman page 183]
Approximation: Rare vs Typical
Let \(S\) be the number of successes in 25 independent trials with probability \(\frac1{10}\) of success on each trial. Let \(m\) be the most likely value of S.
- find \(m\)
- find the probability that \(\mathbf{P}(S=m)\) correct to 3 decimal places.
- what is the normal approximation to \(\mathbf{P}(S=m)\) ?
- what is the Poisson approximation to \(\mathbf{P}(S=m)\) ?
- repeat the first part of the question with the number of trial equal to 2500 rather than 25. Would the normal or Poisson approximation give a better approximation in this case ?
- repeat the first part of the question with the number of trial equal to 2500 rather than 25 and the probability of success as \(\frac1{1000}\) rather that \(\frac1{10}\) . Would the normal or Poisson approximation give a better approximation in this case ?
[Pitman p122 # 7]
Lottery
Suppose that each week you buy a ticket in a lottery which has a chance \(\frac1{100}\) of wining. If you do this every week of a year, approximately what is the chance of getting exactly \(k\) wins for \(k=0,1,2,3\).
[Pittman p122, # 5]
Balls in a Box: Counting
A box contains 20 red balls and 30 black balls. Four balls are chosen without replacement. What is the chance that:
- all balls are red
- exactly three balls are red
- the first red ball appears on the last draw.
- the fist two balls are the same color
Poker Hands: counting
Assume that each of Poker hands are equally likely. The total number of hands is
\[\begin{pmatrix} 52 \\5\end{pmatrix}\]
Find the probability of being dealt each of the following:
- a straight flush ( all cards of the same suit and in order)
- a regular straight (but not a flush)
- two of a kind
- four of a kind
- two pairs (but not four of a kind)
- a full house (a pair and three of a kind)
In all cases, we mean exactly the hand stated. For example, four of a kind does not count as 2 pairs and a full house does not count as a pair or three of a kind.
Subsequence problem
Scoring subsequences or lengths of similar matches or runs is common to a variety of problems from matches in genetic codes to similar runs in bits.
Consider the following question about two sequences of letters. Set both sequences to have length \(k\). At each location of the sequences the probability of a match in letters is \(.7\) and the probability of a mismatch is \(.3\). At each location a match is assigned a score of \(4\) and a mismatch is assigned a score of \(-1\). The total score of the sequence is the sum of the scores at each location, there are \(k\) locations.
Answer the following:
- What is the PMF of the total score if \(k=5\).
- What is the PMF of the total score for a general \(k\) ?
Mark-recapture
A common problem in ecology, social networks, and marketing is estimating the population of a particular species or type. The mark-recapture method is a classic approach to estimating the population.
Assume we want to estimate the population of sturgeon in a section of the Hudson river. We use the following procedure:
- Capture and mark \(h\) sturgeons
- Recapture \(n\) sturgeon and you find that \(y\) of them are marked
- The estimated sturgeon population is \(N = \frac{h n}{y} \).
Motivate statement \(3\) using the hypergeometric distribution.
Introduction to Geometric random variables
Consider flipping a coin that is either heads (H) or tails (T), each with probability 1/2. The coin is flipped over and over (independently) until a head comes up. The outcome space is
\[ \Omega = \{H,TH,TTH,TTTH,\ldots\}. \]
(a) What is \( \mathbf{P}(TTH)\)?
(b) What is the chance that the coin is flipped exactly \(i\) times?
(c) What is the chance that the coin is flipped more than twice?
(d) Repeat the previous three questions for a unfair coin which has probability \(p\) of getting Tails.
[Author Mark Huber. Licensed under Creative Commons]
Roulette
Isabella is playing American roulette, where there are 38 spaces on a wheel, and there is a ball that is equally likely to land in each space. She plays 5 times, and the spins of the wheel are independent. If it lands in any of the 18 red spaces Isabella wins $1, but otherwise she loses $1. After her 5 plays, what is the probability that she ends up with more money than when she started?
[Author Mark Huber. Licensed under Creative Commons]
Digital communications system
A digital communications system consists of a transmitter and a receiver. During each short transmission interval the transmitter sends a signal which is interpreted as a zero, or it sends a different signal which is to be interpreted as a one. At the end of each interval, the receiver makes its best guess at what is transmitted. Consider the events:
\(T_0 = \{\mbox{Transmitter sends } 0\}, \quad T_1 = \{\mbox{Transmitter sends } 1\} \)
\(R_0 = \{\mbox{Receiver perceives } 0\}, \quad R_1 = \{\mbox{Reviver perceives } 1\} \)
Assume that \(\mathbf{P}(R_0 \mid T_0)=.99\), \(\mathbf{P}(R_1 \mid T_1)=.98\) and \(\mathbf{P}(T_1)=.5\).
- Compute probability of transmission error given \(R_1\).
- Compute the overall probability of a transmission error.
- Repeat a) and b) for \(\mathbf{P}(T_1)=.8\).
[Pitman page 54, problem 4]
Committee membership in the senate
A club contains 100 members; 51 are Democrats (or caucus with
Democrats) and 49 are Republicans. A committee of 10 members is
chosen at random.
- Compute the probability of Republicans on the committee for \(n=1,…,10\).
- Find the probability that the committee members are all the same party.
- Suppose you didn’t know how many Democrats there were in the senate. You observe that the committee of \(10\) members consists of \(k=7\) Democrats. Compute \(\mathbf{P}(M|k=7) \), where \(M\) is the number of Democrats in the Senate.
Polya’s urn
An urn contains \(4\) white balls and \(6\) black balls. A ball is chosen at random, and its color is noted. The ball is then replaced, along with \(3\) more balls of the same color. Then another ball is drawn at random from the urn.
- Find the chance that the second ball drawn is white.
- Given the second ball drawn is white, what is the probability that the first ball drawn is black ?
- Suppose the original contents of the urn are \(w\) white and \(b\) black balls. Also after drawing a ball we replace with \(d\) balls of the same color. What is the probability that the second ball drawn is white (it should be \(\frac{w}{w+b}\) )?
[Pitman page 53. Problem 2]
Airline Overbooking
An airline knows that over the long run, 90% of passengers who reserve seats for a flight show up. On a particular flight with 300 seats, the airline sold 324 reservations.
- Assuming that passengers show up independently of each other, what is the chance that the flight will be overbooked ?
- Suppose that people tend to travel in groups. Would that increase of decrease the probability of overbooking ? Explain your answer.
- Redo the the calculation in the first question assuming that passengers always travel in pairs. Are your answers to all three questions consistent ?
[Pitman p. 109, #9]
Standard Normal Tail Bound
As usual define
\[\Phi(z) = \int_{-\infty}^z \phi(x) dx \quad\text{where} \quad \phi(x)=\frac{1}{2\pi} e^{-\frac12 x^2}\]
Some times it is use full to have an estimate of \(1-\Phi(z)\) which rigorously bounds it from above (since we can not write formulas for \(\Phi(z)\) ).
Follow the following steps to prove that
\[ 1- \Phi(z) < \frac{\phi(z)}{z}\,.\]
First argue that
\[ 1- \Phi(z) < \int^{\infty}_z \frac{x}{z}\phi(x) dx\,.\]
Then evaluate the integral on the right hand side to obtain the bound.
Basic Random Walk
Consider the following “game”: A marker is placed on the real line at the point zero. On each turn a coin is flip which a 1 printed on one side and a -1 printed on the other. If the 1 side lands face up, the marker is moved on unit in the positive direction while if the -1 lands heads up then the marker is moved one unit in the negative direction. If the coin has a probability of \(p\) of landing with the 1 side face up, answer the following questions:
- Let \(p=\frac12\). After 10000 turns if you had to pick one site to find the marker which would you choose ?
- Again let \(p=\frac12\). What is the approximate chance that the marker is further then 100 units from this most likely point after 10000 turns ? What is the approximate chance that the marker is further then 300 units from this most likely point after 10000 turns ?
- Repeat the above questions with \(p=\frac{9}{10}\).
Biased coins
Given a random variable \(x = \{ 0,1\} \) where \(0\) corresponds to heads and \(1\) corresponds to tails.
For a single coin flip: \( \mathbf{P}(x \mid p) = p^x(1-p)^{1-x}\).
For a sequence of \(n\) coin flips: \( \mathbf{P}(x_1,…,x_n \mid p) = \prod_{i=1}^n p^{x_i}(1-p)^{1-x_{i}}\).
I have a bag with three types of coins with the following probabilities of drawing each type:
\( \mathbf{P}(p=.5) = .7 \), \( \mathbf{P}(p=.1) = .2 \), \( \mathbf{P}(p=.9) = .1 \).
I draw a coin from the bag. I flip it \(n\) times resulting in a sequence \(X_1,…,X_n\).
- Using Bayes rule provide the formula for
\[ \mathbf{P}(p = .1 \mid x_1,…,x_n),\quad
\mathbf{P}(p = .5 \mid x_1,…,x_n), \quad\text{and}\quad
\mathbf{P}(p = .9 \mid x_1,…,x_n) \]. - If \(x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8)=(1,1,0,1,1,0,0,1)\) what is the most likely value probability \(p \) of the coin the was used ?
Taking classes
There are 18 students in a room. How many students are not majoring in math or science or computer science ?
7 of them are math majors
10 of them are science majors
10 of them are computer science majors
4 of them are math and cs majors
3 of them are science and math majors
5 of them are cs and science majors
1 of them is a math, cs and science major
Chewing gum
Baseball players are chewing gum. Given gum flavor counts determine how many players were sampled.
22 of them chew fruit flavored gum
25 of them chew spearmint flavored gum
39 of them chew grape flavored gum
9 of them chew fruit and spearmint
17 of them chew spearmint and grape
20 of them chew fruit and grape
6 of them chew all
4 of them chew none
The chance of being English
English and American spellings are rigour and rigor, respectively. An English speaking guest staying at a Paris hotel writes the word and chose a letter at random from his spelling. The letter turns out to be a vowel. (that is any of : e,a,i,o,u). If 40% of the English speaking guests are American and 60% are English, what is the probability that the writer is American ?
[Ross, p. 107 #29]
Birthday problem
The birthday problem is a classic problem in probability.
Given \(n\) people in a room what is the probability that at least two of them have the same birthday ?
- Compute \(\mathbf{P}(n)\) assuming that a person being born on any day is equal.
In blogs Andy Gelman and Chris Mulligan talk about how the uniformity assumption may be incorrect and the effect this has on the birthday problem.
- Chris examined the uniformity assumption by looking at CDC data for one year in terms of number of births. He provides R code (that I slightly adapted) that you can run in RStudio to plot the number of births through the year. How different is this from uniform ?
- Given this observed distribution he then computes the difference between the result of the birthday problem given the observed distribution versus a uniform distribution. This is done using Monte Carlo simulation in R (again slightly adapted by me). Does the deviation from the uniform distribution have a strong effect ?
Chance of an Accident.
An insurance company has 50% urban and 50% rural customers. If every year each urban customer has an accident with probability \(\mu\) and each rural customer has an accident with probability \(\lambda\). Assume that the chance of an accident is independent from year to year and from customer to costumer. This is another way to say, conditioned on being and urban or rural the chance of having an accident each year is independent.
A costumer is randomly chosen. Let \(A_n\) be the chance this customer has an accident in year \(n\). Let \(U\) denote the event that this costumer is urban and \(R\) the event that the customer is rural.
- Find \( \mathbf{P}(A_2|A_1) \).
- Are \(A_1\) and \(A_2\) independent in general ? Are there any conditions when it is true if not in general ?
- Show that \(\mathbf{P}(A_2|A_1) \geq \mathbf{P}(A_2) \).
To answer this question it is useful to know that for any positive \(a\) and \(b\), one has \( (a+b)^2 < 2(a^2 +b^2)\) as long as \(a \neq b\). In the case \(a = b\), one has of course \( (a+b)^2 = 2(a^2 +b^2)\). To prove this inequality, first show that \( (a+b)^2 +(a-b)^2= 2(a^2 +b^2)\) and then use that fact that \( (a-b)^2 >0 \). - Find the probability that a driver has an accident in the 3nd year given that they had one in the 1st and 2nd year.
- Find the probability that a driver has an accident in the \(n\)-th year given that they had one in all of the previous years. What is the limit as \(n \rightarrow \infty\) ?
- Find the probability that a diver is a urban diver given that they had an accident in two successive years.
Duels
Mathematicians and politicians throughout history have dueled.
Alexander Hamilton and Aaron Burr dueled.
The French mathematician Evariste Galois died in a duel.
Consider two individuals (H) and (B) for example dueling.
In each round they simultaneously shoot the other and the probability
of a fatal shot is \(0 < p < 1\).
1) What is the probability they are fatally injured in the same round ?
2) What is the probability that (B) will be fatally injured before (H) ?
Two die
Two dice are rolled. Find the probabilities of the following events.
a) the maximum of the two numbers rolled is less than or equal to 2;
b) the maxinum of the two numbers rolled is less than or equal to 3;
c) the maximum of the two numbers rolled is exactly equal to 3;
d) Repeat b) and c) with 3 replaced by \(x=1,…,6\);
e) Denote \( \mathbf{P}(x)\) as the probability that the maximum number is exactly \(x\).
Compute \( \sum_{x=1}^6\mathbf{P}(x)\).
[Pitman Page 10, #7]
Pathway enrichment
A list of \(100\) genes are known to be part of the oxidative phosphorylation pathway.
My friend a molecular biologist screened the activity of \(5000\) genes in both diabetics and normal individuals.
He/she found \(500\) genes that were more active in normal individuals than diabetics.
Of these genes \(60\) of them belong to the list of genes that are part of the oxidative phosphorylation pathway.
What is the probability of this even happening randomly ? What is the scientific question behind the probability problem ?
Drawing tickets
A box contains tickets marked \(1,2,…,n\). A ticket is drawn at random from the box.
Sampling with replacement — Then the ticket is replaced in the box and a second ticket is drawn at random. Find the probability of the following events:
a) the first ticket drawn is numer 1 and the second is number 2;
b) the numbers on the two tickets are consectutive integers;
c) the second number drawn is bigger than the first number.
Sampling without replacement — The ticket is not replaced in the box and a second ticket is drawn at random.
d) Repeat a)-c).
[Pitman page 9, Problem 3]
Seating people
In how many ways can \(6\) people be seated in \(11\) vacant chairs that are arranged in a row ?
Inclusion of origin
Draw \(n\) points from the uniform distribution on the circle and draw the convex hull around these points. What is the probability that the origin (center of the circle) is contained in the convex hull ?
[From: The Probabilistic Method by Alon and Spencer]