CMSC 27100 — Lecture 18

Probability

Now that we have spent some time on learning how to enumerate outcomes, we can start to think about the likelihood of particular outcomes. This is where probability theory comes into play. In particular, we are interested in discrete probability, which deals with countably many outcomes and is attacked using many of the tools that we've been using. The other branch of probability theory deals with uncountably many outcomes and is approached using methods from calculus and analysis. For instance, where we can get away with summation, you will see a lot of integration when working in continuous probability theory.

At first glance, even though we're primarily concerned with discrete probability, it seems a bit strange that we'd try to stuff probability theory into this course which is already chock full of other things to cover. But probability and randomness play an important role in computer science and particularly in theoretical computer science. Probabilistic methods are an important part of analysis of algorithms and algorithm design. There are some problems for which we may not know of an efficient deterministic algorithm, but we can come up with a provably efficient randomized algorithm. (To be honest, these are few and far between. Primality testing used to be a prime example of such a problem, but no longer is.) Probability also plays a large role in cryptography, machine learning and quantum computing. To study these topics, we need a basic understanding of discrete probability.

Definition 18.1. A probability space $(\Omega,\Pr)$ is a finite set $\Omega \neq \emptyset$ together with a function $\Pr : \Omega \to \mathbb R^+$ such that

  1. $\forall \omega \in \Omega, \Pr(\omega) \gt 0$,
  2. $\sum_{\omega \in \Omega} \Pr(\omega) = 1$.

We say that $\Omega$ is the sample space and $\Pr$ is the probability distribution. The above two conditions are known as the probability axioms and is due to Andrey Kolmogorov in the 1930s. Kolmogorov was a Soviet mathematician and was particularly influential in computer science by developing notions of algorithmic information theory in the 1960s. It might be a bit surprising to learn (as I did) that the formalization of basic probability concepts didn't come about until the 20th century, despite there being analysis of games of chance dating back to 17th century France.

An event is a subset $A \subseteq \Omega$ and $\Pr : \mathcal P(\Omega) \to \mathbb R$ defined by $\Pr(A) = \sum_{\omega \in A} \Pr(\omega)$ is the probability of $A$. The atomic events are singleton sets $\{\omega\}$ with $\Pr(\{\omega\}) = \Pr(\omega)$ and the trivial events are events with probability 0 ($\emptyset$) or 1 ($\Omega$).

Example 18.2. A sample space for flipping a coin would be whether it lands on heads and tails $\{H,T\}$. A sample space for the outcome of a die roll would be $\{1,2,3,4,5,6\}$. An event can be something simple like flipping a coin and it landing on heads, in which case the event would be the set $\{H\}$. It can also be more complicated, like a die roll landing on a square, which would be the set $\{1,4\}$, or perhaps the positive divisors of 6, which would be the set $\{1,2,3,6\}$.

Definition 18.3. The uniform distribution over a sample space $\Omega$ defines $\Pr(\omega) = \frac 1 {|\Omega|}$ for all $\omega \in \Omega$. Thus, $\Pr(A) = \frac{|A|}{|\Omega|}$.

Example 18.4. Recall that a standard deck of playing cards consists of four suits, $\spadesuit, \heartsuit, \clubsuit, \diamondsuit$ with cards numbered from 2 through 10, and a Jack, Queen, King, and Ace. This gives us 52 cards in total ($4 \times 13$). What is the probability that a poker hand contains a full house (i.e. a three of a kind together with a pair)? First, let's consider what our sample space is. Since we're concerned with possible poker hands, this should be all of the 5-card hands from our 52 cards. In this case, the size of our sample space is $$|\Omega| = \binom{52}5 = \frac{52 \cdot 51 \cdot 50 \cdot 49 \cdot 48}{5 \cdot 4 \cdot 3 \cdot 2 \cdot 1} = 2598960.$$ Now, let $F$ denote the event for full houses. First, we need to choose two denominations. These will not be the same, since otherwise, we'd need five of the same denomination. We also need to distinguish between the denomination for the three of a kind and the denomination for the pair, so this is a 2-permutation and not a combination, which gives us $P(13,2)$. From this selection, we have $\binom 4 3$ ways to choose a three of a kind for one denomination and $\binom 4 2$ ways to choose a pair. Putting this all together, we get $$|F| = P(13,2) \binom 4 3 \binom 4 2 = (13 \cdot 12) \cdot \frac{4 \cdot 3 \cdot 2}{3 \cdot 2 \cdot 1} \cdot \frac{4 \cdot 3}{2 \cdot 1} = 13 \cdot 12 \cdot 4 \cdot 6 = 3744.$$ Then the probability that we get a full house is $$\Pr(F) = \frac{|F|}{|\Omega|} = \frac{3744}{2598960} \approx 0.00144.$$

So determining the probability of a single event amounts to counting the possible outcomes, or the sizes of the sets for the event and the sample space. And since we are dealing with sets, we can combine our sets in the same way as we've been doing to consider the probability of multiple events together.

Theorem 18.5. $\Pr(A \cup B) = \Pr(A) + \Pr(B) - \Pr(A \cap B)$.

You might notice that this corresponds to the Inclusion-Exclusion Principle from set theory. It leads to the following simple, but very useful theorem:

Theorem 18.6 (Union Bound). $\Pr(A \cup B) \leq \Pr(A) + \Pr(B)$. This will be an equality precisely when the intersection of $A$ and $B$ is empty. We call these events disjoint.

Definition 18.7. Events $A$ and $B$ are disjoint if $A \cap B = \emptyset$.

We can generalize the union bound to the union of an arbitrary number of events:

Theorem 18.8 (Boole's Inequality). $\Pr \left( \bigcup_{i=1}^k A_i \right) \leq \sum_{i=1}^k \Pr(A_i)$. Furthermore, this is an equality if and only if the $A_i$ are pairwise disjoint.

This is an easy theorem to prove using induction. As you might have noticed, this result is due to George Boole, who is the same Boole that gave us Boolean algebra.

Something that doesn't quite have a direct analogue from our discussions on set theory is the notion of conditional events.

Definition 18.9. If $A$ and $B$ are events, then the conditional probability of $A$ relative to $B$ is $$\Pr(A \mid B) = \frac{\Pr(A \cap B)}{\Pr(B)}.$$ Note that by this definition, $\Pr(A \cap B) = \Pr(A \mid B) \cdot \Pr(B)$.

One way of looking at this is if we restrict our attention to the event $B$ and ask, what is the probability of $A$ in this space? In this case, we are asking for the intersection of the events $A$ and $B$. Then to "zoom out" back to the sample space $\Omega$, we divide by the probability of $B$.

Example 18.10. Consider a roll of two dice.

  1. What is the probability that the sum of the dice is 4?
  2. What is the probability that the first die shows 3?
  3. What is the probability that the sum of the dice is 4, given that the first die shows 3?
  4. What is the probability that the first die shows 3, given that the sum of the dice is 4?

Here, if we want to talk about the "first" die, we need our dice to be distinguishable. This means that our sample space is an arrangement of the outcome of the two dice after a roll. Each of the three dice can take on a value from 1 through 6, giving us $6^2 = 36$ possible outcomes.

  1. There are only three satisfying outcomes $(1,3), (2,2)$ and $(3,1)$ for a $\frac{3}{36} = \frac{1}{12}$ chance.
  2. Here, we fix one of the dice, leaving the other one free. This is exactly $1 \cdot 6$, so the probability of this event is $\frac{6}{36} = \frac 1 6$.
  3. This is a conditional probability question. For convenience, let $A$ be the event "sum of the dice is 4" and let $B$ be the event "first die shows 3". We know that the probability of $B$ is $\frac{1}{6}$ and $\Pr(A \cap B) = \Pr(\{(3,1)\}) = \frac{1}{36}$. Hence the conditional probability is $\frac{1/36}{1/6} = \frac{1}{6}$. Of course, we could have just looked at the $B = \{(3,1), (3,2), (3,3), (3,4), (3,5), (3,6)\}$ and noticed that one of the six possibilities corresponded to $A$. That's really what we're doing when we compute conditional probabilities!
  4. Turning things around, $A = \{(1,3), (2,2), (3,1)\}$ of which one satisfies $B$, immediately giving us the answer $\frac{1}{3}$. This works out mathematically since the denominator is just $\frac{1}{12}$ and $\frac{1/36}{1/12}=\frac{1}{3}$.