PaC Exercise 2.12
import numpy as np
Exercise 2.12
We draw cards uniformly at random with replacement from a deck of n cards. What is the expected number of cards we must draw until we have seen all n cards in the deck? If we draw 2n cards, what is the expected number of cards in the deck that are not chosen at all? Chosen exactly once?
Solution: Let X be the random variable of after how many draws we have seen all cards and let Xi, i = 1, …, n, be the number of cards drawn until we see the ith unique card. Then we have
$$
X = \sum_{i=1}^n X_i.
$$
Xi, i = 1, …, n, is geometric distributed with probability of success $p_i := 1 - \frac{i-1}{n}$. Therefore,
$$
\begin{align*}
P(X_i = k) &= (1 - p_i)^{k-1} p_i,\\
P(X_i \geq k) &= (1 - p_i)^{k-1}.
\end{align*}
$$
Recall that the expectation value of a geometric distributed random variable can be calculated by
$$
E[X_i] = \sum_{k=1}^\infty P(X_i \geq k) = \sum_{k=1}^\infty (1 - p_i)^{k-1} = \frac{1}{1-(1-p_i)} = \frac{1}{p_i}.
$$
Therefore,
$$
E[X] = \sum_{i=1}^n E[X_i] = \sum_{i=1}^n \frac{1}{p_i} = \sum_{i=1}^n \frac{1}{1 - \frac{i-1}{n}}
= n \sum_{i=1}^n \frac{1}{n-i+1} = n \sum_{i=1}^n \frac{1}{i} = n \cdot H(n),
$$
where H(n) denotes the nth harmonic number.
In particular, for a n = 52 card deck we get
$$
E[X] = 52 \sum_{i=1}^{52} \frac{1}{i} = 235.978\dots.
$$
We can verify this by a simple (but inefficient) simulation with the following code.
= 52
n = 10000
num_samples
= 20*n # max. drawn per experiment
m
def calculate_rv_X(a, n):
for i in range(1, len(a)+1):
if len(set(a[:i])) == n:
return i
raise RuntimeError("Draws exceeded. Try again or increase m.")
= np.random.randint(low=0, high=n, size=(num_samples, m))
samples
= np.array([calculate_rv_X(s, n) for s in samples])
sim_X sim_X.mean()
236.4747