Home » Articles posted by sayan@stat.duke.edu
Author Archives: sayan@stat.duke.edu
Linear regression
Consider the following model:
\(X_1,…,X_n \stackrel{iid}{\sim} f(x), \quad Y_i = \theta X_i + \varepsilon_i, \quad \varepsilon_i \stackrel{iid}{\sim} \mbox{N}(0,\sigma^2).\)
- Compute \({\mathbf E }(Y \mid X)\)
- Compute \({\mathbf E }(\varepsilon \mid X)\)
- Compute \({\mathbf E }( \varepsilon)\)
- Show \( \theta = \frac{{\mathbf E}(XY)}{{\mathbf E}(X^2)}\)
Clinical trial
Let \(X\) be the number of patients in a clinical trial with a successful outcome. Let \(P\) be the probability of success for an individual patient. We assume before the trial begins that \(P\) is unifom on \([0,1]\). Compute
- \(f(P \mid X)\)
- \( {\mathbf E}( P \mid X)\)
- \( {\mathbf Var}( P \mid X)\)
Order statistics II
Suppose \(X_1, … , X_{17}\) are iid uniform on \( (.5,.8) \). What is \({\mathbf{E}} [X_{(k)}] \) ?
Order statistics I
Suppose \(X_1, … , X_n \stackrel{iid}{\sim} U(0,1) \). How large must \(n\) be to have that \({\mathbf{P}}(X_{(n)} \geq .95) \geq 1/2\) ?
Beta-binomial
You have a sequence of coins \(X_1,…,X_n\) drawn iid from a Bernouli distribution with unknown parameter \(p\) and known fixed \(n\). Assume a priori that the coins parameter \(p\) follows a Beta distribution with parameters \(\alpha,\beta\).
- Given the sequence \(X_1,…,X_n\) what is the posterior pdf of \(p\) ?
- For what value of \(p\) is the maximum of the posterior pdf attained.
Hint: If \(X\) is distributed Bernoulli(p) then for \(x=1,0\) one has \(P(X=x)=p^x(1-p)^{(1-x)}\). Furthermore, if \(X_1,X_2\) are i.i.d. Bernoulli(p) then
\[P(X_1=x_1, X_2=x_2 )=P(X_1=x_1)P(X_2=x_2 )=p^{x_1}(1-p)^{(1-x_1)}p^{x_2}(1-p)^{(1-x_2)}\]
Car tires
The air pressure in the left and right front tires of a car are random variables \(X\) and \(Y\), respectively. Tires should be filled to 26psi. The joint pdf is
\( f(x,y) = K(x^2+y^2), \quad 20 \leq x,y \leq 30 \)
- What is \(K\) ?
- Are the random variables independent ?
- What is the probability that both tires are underfilled ?
- What is the probability that \( |X-Y| \leq 3 \) ?
- What are the marginal densities ?
Joint of min and max
Let \(X_1,…,X_n \stackrel{iid}{\sim} \mbox{Exp}(\lambda) \)
Let \(V = \mbox{min}(X_1,…,X_n)\) and \(W = \mbox{max}(X_1,…,X_n)\).
What is the joint distribution of \(V,W\). Are they independent ?
Joint density part 1
Let \(X\) and \(Y\) have joint density
\(f(x,y) = 90(y-x)^8, \quad 0<x<y<1\)
- State the marginal distribution for \(X\)
- State the marginal distribution for \(Y\)
- Are these two random variables independent?
- What is \(\mathbf{P}(Y > 2X)\)
- Fill in the blanks “The density \(f(x,y)\) above is the joint density of the _________ and __________ of ten independent uniform \((0,1)\) random variables.”
[Adapted from Pitman pg 354]
Expectation of min of exponentials
There are \(15\) stock brokers. The returns (in thousands of dollars) on each brokers is modeled as a separate independent exponential distribution \(X_1 \sim \mbox{Exp}(\lambda_1),…,X_{15} \sim \mbox{Exp}(\lambda_{15})\). Define \(Z = \min\{X_1,…,X_{15}\}\).
What is \(\mathbf{E}(Z)\) ?
Two normals
A sequence \(X_1,…,X_n\) is draw iid from either \(\mbox{N}(0,1)\) or \(\mbox{N}(0,10)\) with equal prior probability.
- State the formulae for the probabilities that the sequence came from the normal with mean \(1\) or mean \(10\).
- If you know the mean of the normal is \(1\) then what is the variance of \(S = \sum_i X_i\) and \( \hat{\mu} = \frac{1}{n} \sum_i X_i\).
- What is \(\mbox{Pr}(Z > \max\{x_1,…,x_n\})\) if \(\mu =1\) and \(\mu =10\).
Limit for mixtures
Consider the following mixture distribution.
- Draw \(X \sim \mbox{Be}(p=.3)\)
- If \(X=1\) then \(Y \sim \mbox{Geo}(p_1)\)
- If \(X= 0\) then \(Y \sim \mbox{Bin}(n,p_2)\)
Consider the sequence of random variables \(Y_1,…,Y_{200}\) drawn iid from the above random experiment.
Use the central limit theorem to state the distribution of \(S = \frac{1}{200} \sum_i^{200} Y_i\).
(Here \(\mbox{Be}(p)\) is the Bernoulli distribution with parameter \(p\) and \(\mbox{Geo}(p)\) is the geometric distribution with the parameter \(p\). )
Strontium
Assume we have a large number of particles \(N\) of Strontium. The decay model for Strontium is exponential in that \(\mathbf{P}(T > t) = e^{- \lambda t}\), this states the probability of a an atom surviving until time \(T\).
- The half-life of a substance is the amount of time it takes for an appreciable amount of the substance to be reduced in half. If the half life of strontium is 28 years what is the decay parameter of the exponential ?
- What is the probability Strontium lasts at least 50 years, \(\mathbf{P}(T > 50) \) ?
- Suppose we have \(5\) radioactive substances, the decay of each of which can be modeled by five exponential random variables \(X_1,…,X_5\) with parameters \(\lambda_1,…,\lambda_5\). Assume the five distributions are independent. What is the pdf for \(\min\{X_1,…,X_5\}\).
Expectation of hierachical model
Consider the following hierarchical random variable
- \(\lambda \sim \mbox{Geometric}(p)\)
- \(Y \mid \lambda \sim \mbox{Poisson}(\lambda)\)
Expectation of geometric
Use the expectation as tail sum tool to compute the expectation of the geometric distribution.
Expectation of mixture distribution
Consider the following mixture distribution.
- Draw \(X \sim \mbox{Ber}(p=.3)\)
- If \(X=1\) then \(Y \sim \mbox{Geometric}(p_1)\)
- If \(X= 0\) then \(Y \sim \mbox{Bin}(n,p_2)\)
What is \(\mathbf{E}(Y)\) ?. (*) What is \(\mathbf{E}(Y | X )\) ?.
Maximum of die rolls
Let \(X_1,…,X_5\) be five iid rolls of six sided die. Let \(Z = \mbox{max}\{X_1,…,X_5\}\). Compute \(\mathbf{E}(Z)\).
Sums of Poisson
Agambler bets ten times on events of probability \(1/10\), then twenty times on events with probability \(1/20\), then thirty times on events with probability \(1/30\), then forty times on events with probability \(1/40\). Assuming the vents are independent, what is the approximate distribution of the number of times the gambler wins ? (use Poisson approx. of binomial)
[Pitman 2.5, pg 227]
Polonium data
Look at the following link to the following table summarizing the radioactive decay counts of polonium recorded by Rutherford and Geiger (1910) representing the number of scintillations in 2608 1/8 minute intervals. For example, there were 57 frequencies of zero counts. The counts can be thought of as being approximately Poisson distributed.
- Use the fact that for the Poisson distribution \( \mathbf{E}[X] = \lambda \) to estimate the rate parameter. This is using the methods of moments to estimate a parameter.
- Maximize the likelihood to estimate \( \lambda\).
Random stock brokers
There are \(15\) stock brokers. The returns (in thousands of dollars) on the brokers are modeled
\( X_1,…,X_{15} \stackrel{iid}{\sim} \mbox{N}(0,1).\)
What is the probability that given the above random model at least one broker would bring in greater than $1000 dollars.
Conditional Poisson
The following is a hierarchical model.
- \(\lambda \sim Uniform[1,2]\)
- \(Y \mid \lambda \sim \mbox{Poisson}(\lambda)\)
What is \(\mathbf{E}(Y)\) ?
Mixture of Poisson
The following is a mixture model. The following experiment is used to draw a random variable \(Y\). With probability \(p\) draw from a Poisson distribution with parameter \(\lambda = 1\) so with probability \(1-p\) you are drawing from a Poisson distribution with parameter \(\lambda =2 \).
What is \(\mathbf{E}(Y)\) ?
Variance of binomial vs. hypergeometric
Given \(N\) balls of which \(r\) of them are red and the rest are green. Denote \(X\) as the number of red balls drawn when sampling with replacement and \(Y\) as the number of red balls drawn when sampling without replacement.
- What is the difference between the variance of \(X\) and the variance of \(Y\) ?
- For what values of \(N,n,r\) is the variance the largest ?
Expectation of geometric distribution
Compute the expectation of the geometric distribution using the fact that in this case
\(\mathbf{E}(X)= \sum_{k=1}^{\infty} \mathbf{Pr}(X\geq k) \)
Expectations of die rolls
A fair die is rolled ten times. Find numerical values for the expectations of each of the following random variables
- the sum of the numbers in the ten rolls;
- the sum of the largest two numbers in the first three rolls;
- the maximum number in the first five rolls;
- the number of multiples of three in the first ten rolls;
- the number of faces which fail to appear in the ten rolls;
- the number of different faces that appear in the ten rolls.
[From Pitman page 183]
Subsequence problem
Scoring subsequences or lengths of similar matches or runs is common to a variety of problems from matches in genetic codes to similar runs in bits.
Consider the following question about two sequences of letters. Set both sequences to have length \(k\). At each location of the sequences the probability of a match in letters is \(.7\) and the probability of a mismatch is \(.3\). At each location a match is assigned a score of \(4\) and a mismatch is assigned a score of \(-1\). The total score of the sequence is the sum of the scores at each location, there are \(k\) locations.
Answer the following:
- What is the PMF of the total score if \(k=5\).
- What is the PMF of the total score for a general \(k\) ?
Mark-recapture
A common problem in ecology, social networks, and marketing is estimating the population of a particular species or type. The mark-recapture method is a classic approach to estimating the population.
Assume we want to estimate the population of sturgeon in a section of the Hudson river. We use the following procedure:
- Capture and mark \(h\) sturgeons
- Recapture \(n\) sturgeon and you find that \(y\) of them are marked
- The estimated sturgeon population is \(N = \frac{h n}{y} \).
Motivate statement \(3\) using the hypergeometric distribution.
Digital communications system
A digital communications system consists of a transmitter and a receiver. During each short transmission interval the transmitter sends a signal which is interpreted as a zero, or it sends a different signal which is to be interpreted as a one. At the end of each interval, the receiver makes its best guess at what is transmitted. Consider the events:
\(T_0 = \{\mbox{Transmitter sends } 0\}, \quad T_1 = \{\mbox{Transmitter sends } 1\} \)
\(R_0 = \{\mbox{Receiver perceives } 0\}, \quad R_1 = \{\mbox{Reviver perceives } 1\} \)
Assume that \(\mathbf{P}(R_0 \mid T_0)=.99\), \(\mathbf{P}(R_1 \mid T_1)=.98\) and \(\mathbf{P}(T_1)=.5\).
- Compute probability of transmission error given \(R_1\).
- Compute the overall probability of a transmission error.
- Repeat a) and b) for \(\mathbf{P}(T_1)=.8\).
[Pitman page 54, problem 4]
Committee membership in the senate
A club contains 100 members; 51 are Democrats (or caucus with
Democrats) and 49 are Republicans. A committee of 10 members is
chosen at random.
- Compute the probability of Republicans on the committee for \(n=1,…,10\).
- Find the probability that the committee members are all the same party.
- Suppose you didn’t know how many Democrats there were in the senate. You observe that the committee of \(10\) members consists of \(k=7\) Democrats. Compute \(\mathbf{P}(M|k=7) \), where \(M\) is the number of Democrats in the Senate.
Polya’s urn
An urn contains \(4\) white balls and \(6\) black balls. A ball is chosen at random, and its color is noted. The ball is then replaced, along with \(3\) more balls of the same color. Then another ball is drawn at random from the urn.
- Find the chance that the second ball drawn is white.
- Given the second ball drawn is white, what is the probability that the first ball drawn is black ?
- Suppose the original contents of the urn are \(w\) white and \(b\) black balls. Also after drawing a ball we replace with \(d\) balls of the same color. What is the probability that the second ball drawn is white (it should be \(\frac{w}{w+b}\) )?
[Pitman page 53. Problem 2]