Home Academia Contact Archive

PaC Exercise 2.12

Posted on February 18, 2021
import numpy as np

Exercise 2.12

We draw cards uniformly at random with replacement from a deck of n cards. What is the expected number of cards we must draw until we have seen all n cards in the deck? If we draw 2n cards, what is the expected number of cards in the deck that are not chosen at all? Chosen exactly once?

Solution: Let X be the random variable of after how many draws we have seen all cards and let Xi, i = 1, …, n, be the number of cards drawn until we see the ith unique card. Then we have


$$ X = \sum_{i=1}^n X_i. $$

Xi, i = 1, …, n, is geometric distributed with probability of success $p_i := 1 - \frac{i-1}{n}$. Therefore,


$$ \begin{align*} P(X_i = k) &= (1 - p_i)^{k-1} p_i,\\ P(X_i \geq k) &= (1 - p_i)^{k-1}. \end{align*} $$

Recall that the expectation value of a geometric distributed random variable can be calculated by


$$ E[X_i] = \sum_{k=1}^\infty P(X_i \geq k) = \sum_{k=1}^\infty (1 - p_i)^{k-1} = \frac{1}{1-(1-p_i)} = \frac{1}{p_i}. $$

Therefore,


$$ E[X] = \sum_{i=1}^n E[X_i] = \sum_{i=1}^n \frac{1}{p_i} = \sum_{i=1}^n \frac{1}{1 - \frac{i-1}{n}} = n \sum_{i=1}^n \frac{1}{n-i+1} = n \sum_{i=1}^n \frac{1}{i} = n \cdot H(n), $$

where H(n) denotes the nth harmonic number.

In particular, for a n = 52 card deck we get


$$ E[X] = 52 \sum_{i=1}^{52} \frac{1}{i} = 235.978\dots. $$

We can verify this by a simple (but inefficient) simulation with the following code.

n = 52
num_samples = 10000

m = 20*n  # max. drawn per experiment


def calculate_rv_X(a, n):
    for i in range(1, len(a)+1):
        if len(set(a[:i])) == n:
            return i
    raise RuntimeError("Draws exceeded. Try again or increase m.")
    
samples = np.random.randint(low=0, high=n, size=(num_samples, m))

sim_X = np.array([calculate_rv_X(s, n) for s in samples])
sim_X.mean()
236.4747