Category Archives: hypergeometric distribution

Variance of binomial vs. hypergeometric

Given \(N\) balls of which \(r\) of them are red and the rest are green. Denote \(X\) as the number of red balls drawn when sampling with replacement and \(Y\) as the number of red balls drawn when sampling without replacement.

  1. What is the difference between the variance of \(X\) and the variance of \(Y\) ?
  2. For what values of \(N,n,r\) is the variance the largest ?




A common problem in ecology, social networks, and marketing is estimating the population of a particular species or type. The mark-recapture method is a classic approach to estimating the population.

Assume we want to estimate the population of sturgeon in a section of the Hudson river. We use the following procedure:

  1. Capture and mark \(h\) sturgeons
  2. Recapture \(n\) sturgeon and you find that \(y\) of them are marked
  3. The estimated sturgeon population is \(N = \frac{h n}{y} \).

Motivate statement \(3\) using the hypergeometric distribution.