Home » Articles posted by sayan@stat.duke.edu
Author Archives: sayan@stat.duke.edu
Linear regression
Consider the following model:
\(X_1,…,X_n \stackrel{iid}{\sim} f(x), \quad Y_i = \theta X_i + \varepsilon_i, \quad \varepsilon_i \stackrel{iid}{\sim} \mbox{N}(0,\sigma^2).\)
- Compute \({\mathbf E }(Y \mid X)\)
- Compute \({\mathbf E }(\varepsilon \mid X)\)
- Compute \({\mathbf E }( \varepsilon)\)
- Show \( \theta = \frac{{\mathbf E}(XY)}{{\mathbf E}(X^2)}\)
Clinical trial
Let \(X\) be the number of patients in a clinical trial with a successful outcome. Let \(P\) be the probability of success for an individual patient. We assume before the trial begins that \(P\) is unifom on \([0,1]\). Compute
- \(f(P \mid X)\)
- \( {\mathbf E}( P \mid X)\)
- \( {\mathbf Var}( P \mid X)\)
Order statistics II
Suppose \(X_1, … , X_{17}\) are iid uniform on \( (.5,.8) \). What is \({\mathbf{E}} [X_{(k)}] \) ?
Order statistics I
Suppose \(X_1, … , X_n \stackrel{iid}{\sim} U(0,1) \). How large must \(n\) be to have that \({\mathbf{P}}(X_{(n)} \geq .95) \geq 1/2\) ?
Beta-binomial
You have a sequence of coins \(X_1,…,X_n\) drawn iid from a Bernouli distribution with unknown parameter \(p\) and known fixed \(n\). Assume a priori that the coins parameter \(p\) follows a Beta distribution with parameters \(\alpha,\beta\).
- Given the sequence \(X_1,…,X_n\) what is the posterior pdf of \(p\) ?
- For what value of \(p\) is the maximum of the posterior pdf attained.
Hint: If \(X\) is distributed Bernoulli(p) then for \(x=1,0\) one has \(P(X=x)=p^x(1-p)^{(1-x)}\). Furthermore, if \(X_1,X_2\) are i.i.d. Bernoulli(p) then
\[P(X_1=x_1, X_2=x_2 )=P(X_1=x_1)P(X_2=x_2 )=p^{x_1}(1-p)^{(1-x_1)}p^{x_2}(1-p)^{(1-x_2)}\]
Car tires
The air pressure in the left and right front tires of a car are random variables \(X\) and \(Y\), respectively. Tires should be filled to 26psi. The joint pdf is
\( f(x,y) = K(x^2+y^2), \quad 20 \leq x,y \leq 30 \)
- What is \(K\) ?
- Are the random variables independent ?
- What is the probability that both tires are underfilled ?
- What is the probability that \( |X-Y| \leq 3 \) ?
- What are the marginal densities ?
Joint of min and max
Let \(X_1,…,X_n \stackrel{iid}{\sim} \mbox{Exp}(\lambda) \)
Let \(V = \mbox{min}(X_1,…,X_n)\) and \(W = \mbox{max}(X_1,…,X_n)\).
What is the joint distribution of \(V,W\). Are they independent ?
Joint density part 1
Let \(X\) and \(Y\) have joint density
\(f(x,y) = 90(y-x)^8, \quad 0<x<y<1\)
- State the marginal distribution for \(X\)
- State the marginal distribution for \(Y\)
- Are these two random variables independent?
- What is \(\mathbf{P}(Y > 2X)\)
- Fill in the blanks “The density \(f(x,y)\) above is the joint density of the _________ and __________ of ten independent uniform \((0,1)\) random variables.”
[Adapted from Pitman pg 354]
Expectation of min of exponentials
There are \(15\) stock brokers. The returns (in thousands of dollars) on each brokers is modeled as a separate independent exponential distribution \(X_1 \sim \mbox{Exp}(\lambda_1),…,X_{15} \sim \mbox{Exp}(\lambda_{15})\). Define \(Z = \min\{X_1,…,X_{15}\}\).
What is \(\mathbf{E}(Z)\) ?
Two normals
A sequence \(X_1,…,X_n\) is draw iid from either \(\mbox{N}(0,1)\) or \(\mbox{N}(0,10)\) with equal prior probability.
- State the formulae for the probabilities that the sequence came from the normal with mean \(1\) or mean \(10\).
- If you know the mean of the normal is \(1\) then what is the variance of \(S = \sum_i X_i\) and \( \hat{\mu} = \frac{1}{n} \sum_i X_i\).
- What is \(\mbox{Pr}(Z > \max\{x_1,…,x_n\})\) if \(\mu =1\) and \(\mu =10\).
Limit for mixtures
Consider the following mixture distribution.
- Draw \(X \sim \mbox{Be}(p=.3)\)
- If \(X=1\) then \(Y \sim \mbox{Geo}(p_1)\)
- If \(X= 0\) then \(Y \sim \mbox{Bin}(n,p_2)\)
Consider the sequence of random variables \(Y_1,…,Y_{200}\) drawn iid from the above random experiment.
Use the central limit theorem to state the distribution of \(S = \frac{1}{200} \sum_i^{200} Y_i\).
(Here \(\mbox{Be}(p)\) is the Bernoulli distribution with parameter \(p\) and \(\mbox{Geo}(p)\) is the geometric distribution with the parameter \(p\). )
Strontium
Assume we have a large number of particles \(N\) of Strontium. The decay model for Strontium is exponential in that \(\mathbf{P}(T > t) = e^{- \lambda t}\), this states the probability of a an atom surviving until time \(T\).
- The half-life of a substance is the amount of time it takes for an appreciable amount of the substance to be reduced in half. If the half life of strontium is 28 years what is the decay parameter of the exponential ?
- What is the probability Strontium lasts at least 50 years, \(\mathbf{P}(T > 50) \) ?
- Suppose we have \(5\) radioactive substances, the decay of each of which can be modeled by five exponential random variables \(X_1,…,X_5\) with parameters \(\lambda_1,…,\lambda_5\). Assume the five distributions are independent. What is the pdf for \(\min\{X_1,…,X_5\}\).
Expectation of hierachical model
Consider the following hierarchical random variable
- \(\lambda \sim \mbox{Geometric}(p)\)
- \(Y \mid \lambda \sim \mbox{Poisson}(\lambda)\)
Expectation of geometric
Use the expectation as tail sum tool to compute the expectation of the geometric distribution.
Expectation of mixture distribution
Consider the following mixture distribution.
- Draw \(X \sim \mbox{Ber}(p=.3)\)
- If \(X=1\) then \(Y \sim \mbox{Geometric}(p_1)\)
- If \(X= 0\) then \(Y \sim \mbox{Bin}(n,p_2)\)
What is \(\mathbf{E}(Y)\) ?. (*) What is \(\mathbf{E}(Y | X )\) ?.
Maximum of die rolls
Let \(X_1,…,X_5\) be five iid rolls of six sided die. Let \(Z = \mbox{max}\{X_1,…,X_5\}\). Compute \(\mathbf{E}(Z)\).
Sums of Poisson
Agambler bets ten times on events of probability \(1/10\), then twenty times on events with probability \(1/20\), then thirty times on events with probability \(1/30\), then forty times on events with probability \(1/40\). Assuming the vents are independent, what is the approximate distribution of the number of times the gambler wins ? (use Poisson approx. of binomial)
[Pitman 2.5, pg 227]
Polonium data
Look at the following link to the following table summarizing the radioactive decay counts of polonium recorded by Rutherford and Geiger (1910) representing the number of scintillations in 2608 1/8 minute intervals. For example, there were 57 frequencies of zero counts. The counts can be thought of as being approximately Poisson distributed.
- Use the fact that for the Poisson distribution \( \mathbf{E}[X] = \lambda \) to estimate the rate parameter. This is using the methods of moments to estimate a parameter.
- Maximize the likelihood to estimate \( \lambda\).
Random stock brokers
There are \(15\) stock brokers. The returns (in thousands of dollars) on the brokers are modeled
\( X_1,…,X_{15} \stackrel{iid}{\sim} \mbox{N}(0,1).\)
What is the probability that given the above random model at least one broker would bring in greater than $1000 dollars.
Conditional Poisson
The following is a hierarchical model.
- \(\lambda \sim Uniform[1,2]\)
- \(Y \mid \lambda \sim \mbox{Poisson}(\lambda)\)
What is \(\mathbf{E}(Y)\) ?
Mixture of Poisson
The following is a mixture model. The following experiment is used to draw a random variable \(Y\). With probability \(p\) draw from a Poisson distribution with parameter \(\lambda = 1\) so with probability \(1-p\) you are drawing from a Poisson distribution with parameter \(\lambda =2 \).
What is \(\mathbf{E}(Y)\) ?
Variance of binomial vs. hypergeometric
Given \(N\) balls of which \(r\) of them are red and the rest are green. Denote \(X\) as the number of red balls drawn when sampling with replacement and \(Y\) as the number of red balls drawn when sampling without replacement.
- What is the difference between the variance of \(X\) and the variance of \(Y\) ?
- For what values of \(N,n,r\) is the variance the largest ?
Expectation of geometric distribution
Compute the expectation of the geometric distribution using the fact that in this case
\(\mathbf{E}(X)= \sum_{k=1}^{\infty} \mathbf{Pr}(X\geq k) \)
Expectations of die rolls
A fair die is rolled ten times. Find numerical values for the expectations of each of the following random variables
- the sum of the numbers in the ten rolls;
- the sum of the largest two numbers in the first three rolls;
- the maximum number in the first five rolls;
- the number of multiples of three in the first ten rolls;
- the number of faces which fail to appear in the ten rolls;
- the number of different faces that appear in the ten rolls.
[From Pitman page 183]
Subsequence problem
Scoring subsequences or lengths of similar matches or runs is common to a variety of problems from matches in genetic codes to similar runs in bits.
Consider the following question about two sequences of letters. Set both sequences to have length \(k\). At each location of the sequences the probability of a match in letters is \(.7\) and the probability of a mismatch is \(.3\). At each location a match is assigned a score of \(4\) and a mismatch is assigned a score of \(-1\). The total score of the sequence is the sum of the scores at each location, there are \(k\) locations.
Answer the following:
- What is the PMF of the total score if \(k=5\).
- What is the PMF of the total score for a general \(k\) ?
Mark-recapture
A common problem in ecology, social networks, and marketing is estimating the population of a particular species or type. The mark-recapture method is a classic approach to estimating the population.
Assume we want to estimate the population of sturgeon in a section of the Hudson river. We use the following procedure:
- Capture and mark \(h\) sturgeons
- Recapture \(n\) sturgeon and you find that \(y\) of them are marked
- The estimated sturgeon population is \(N = \frac{h n}{y} \).
Motivate statement \(3\) using the hypergeometric distribution.
Digital communications system
A digital communications system consists of a transmitter and a receiver. During each short transmission interval the transmitter sends a signal which is interpreted as a zero, or it sends a different signal which is to be interpreted as a one. At the end of each interval, the receiver makes its best guess at what is transmitted. Consider the events:
\(T_0 = \{\mbox{Transmitter sends } 0\}, \quad T_1 = \{\mbox{Transmitter sends } 1\} \)
\(R_0 = \{\mbox{Receiver perceives } 0\}, \quad R_1 = \{\mbox{Reviver perceives } 1\} \)
Assume that \(\mathbf{P}(R_0 \mid T_0)=.99\), \(\mathbf{P}(R_1 \mid T_1)=.98\) and \(\mathbf{P}(T_1)=.5\).
- Compute probability of transmission error given \(R_1\).
- Compute the overall probability of a transmission error.
- Repeat a) and b) for \(\mathbf{P}(T_1)=.8\).
[Pitman page 54, problem 4]
Committee membership in the senate
A club contains 100 members; 51 are Democrats (or caucus with
Democrats) and 49 are Republicans. A committee of 10 members is
chosen at random.
- Compute the probability of Republicans on the committee for \(n=1,…,10\).
- Find the probability that the committee members are all the same party.
- Suppose you didn’t know how many Democrats there were in the senate. You observe that the committee of \(10\) members consists of \(k=7\) Democrats. Compute \(\mathbf{P}(M|k=7) \), where \(M\) is the number of Democrats in the Senate.
Polya’s urn
An urn contains \(4\) white balls and \(6\) black balls. A ball is chosen at random, and its color is noted. The ball is then replaced, along with \(3\) more balls of the same color. Then another ball is drawn at random from the urn.
- Find the chance that the second ball drawn is white.
- Given the second ball drawn is white, what is the probability that the first ball drawn is black ?
- Suppose the original contents of the urn are \(w\) white and \(b\) black balls. Also after drawing a ball we replace with \(d\) balls of the same color. What is the probability that the second ball drawn is white (it should be \(\frac{w}{w+b}\) )?
[Pitman page 53. Problem 2]
Biased coins
Given a random variable \(x = \{ 0,1\} \) where \(0\) corresponds to heads and \(1\) corresponds to tails.
For a single coin flip: \( \mathbf{P}(x \mid p) = p^x(1-p)^{1-x}\).
For a sequence of \(n\) coin flips: \( \mathbf{P}(x_1,…,x_n \mid p) = \prod_{i=1}^n p^{x_i}(1-p)^{1-x_{i}}\).
I have a bag with three types of coins with the following probabilities of drawing each type:
\( \mathbf{P}(p=.5) = .7 \), \( \mathbf{P}(p=.1) = .2 \), \( \mathbf{P}(p=.9) = .1 \).
I draw a coin from the bag. I flip it \(n\) times resulting in a sequence \(X_1,…,X_n\).
- Using Bayes rule provide the formula for
\[ \mathbf{P}(p = .1 \mid x_1,…,x_n),\quad
\mathbf{P}(p = .5 \mid x_1,…,x_n), \quad\text{and}\quad
\mathbf{P}(p = .9 \mid x_1,…,x_n) \]. - If \(x_1,x_2,x_3,x_4,x_5,x_6,x_7,x_8)=(1,1,0,1,1,0,0,1)\) what is the most likely value probability \(p \) of the coin the was used ?
Taking classes
There are 18 students in a room. How many students are not majoring in math or science or computer science ?
7 of them are math majors
10 of them are science majors
10 of them are computer science majors
4 of them are math and cs majors
3 of them are science and math majors
5 of them are cs and science majors
1 of them is a math, cs and science major
Chewing gum
Baseball players are chewing gum. Given gum flavor counts determine how many players were sampled.
22 of them chew fruit flavored gum
25 of them chew spearmint flavored gum
39 of them chew grape flavored gum
9 of them chew fruit and spearmint
17 of them chew spearmint and grape
20 of them chew fruit and grape
6 of them chew all
4 of them chew none
Birthday problem
The birthday problem is a classic problem in probability.
Given \(n\) people in a room what is the probability that at least two of them have the same birthday ?
- Compute \(\mathbf{P}(n)\) assuming that a person being born on any day is equal.
In blogs Andy Gelman and Chris Mulligan talk about how the uniformity assumption may be incorrect and the effect this has on the birthday problem.
- Chris examined the uniformity assumption by looking at CDC data for one year in terms of number of births. He provides R code (that I slightly adapted) that you can run in RStudio to plot the number of births through the year. How different is this from uniform ?
- Given this observed distribution he then computes the difference between the result of the birthday problem given the observed distribution versus a uniform distribution. This is done using Monte Carlo simulation in R (again slightly adapted by me). Does the deviation from the uniform distribution have a strong effect ?
Duels
Mathematicians and politicians throughout history have dueled.
Alexander Hamilton and Aaron Burr dueled.
The French mathematician Evariste Galois died in a duel.
Consider two individuals (H) and (B) for example dueling.
In each round they simultaneously shoot the other and the probability
of a fatal shot is \(0 < p < 1\).
1) What is the probability they are fatally injured in the same round ?
2) What is the probability that (B) will be fatally injured before (H) ?