Conditional Probability and
Bayes’ Theorem

Applied Statistics

MTH-361A | Spring 2026 | University of Portland

Objectives

Why Learn Conditional Probability in Statistics?

Understanding Associations:

Basis for Statistical Inference:

Improves Decision-Making:

Clarifies Intuition About Data

Sampling with Replacement

Sampling with replacement is a sampling method where each selected unit is returned to the population before the next selection. Each selection is independent of previous selection.

Characteristics:

Sampling without Replacement

Sampling without replacement is a sampling method where each selected unit is not returned to the population before the next draw. Each selection depends on previous selections, as the population size decreases.

Characteristics:

With vs Without Replacement

Feature Sampling with Replacement Sampling without Replacement
Independence Yes No
Duplication Yes No
Changing Sample Space No Yes
Changing Probabilities No Yes

Drawing Cards (with Replacement)

You draw cards from a standard \(52\)-card deck.

With Replacement:

\(\star\) Outcomes do not affect future draws.

Example:

Drawing Cards (without Replacement)

You draw cards from a standard 52-card deck.

Without Replacement:

\(\star\) Outcomes change future probabilities.

Example:

Multiplication Rule:

\[ \begin{aligned} P(\text{Ace}_1 \text{ and Ace}_2) & = P(\text{Ace}_1) P(\text{Ace}_2 \text{ given } \text{Ace}_1) \\ & = \left( \frac{4}{52} \right) \left( \frac{3}{51} \right) \\ & = \frac{1}{221} \\ P(\text{Ace}_1 \text{ and Ace}_2) & \approx 0.0045 \end{aligned} \]

Conditional Probability

Why Conditional Probability?

What is Conditional Probability?

Definition:

\(\star\) Conditional probability helps us update probabilities based on new information.

Multiplication Rule

Independence: Given events \(A\) and \(B\) with non-zero probability in the sample space \(S\), the following are equivalent:

Dependence: Given events \(A\) and \(B\), then by definition:

Not Always Symmetric:

\[P(A|B) \ne P(B|A)\]

One Ball from Jars

Set-up:

\(J_1\) \(J_2\)
\(G\) \(14\) \(14\)
\(B\) \(11\) \(6\)
Sum \(25\) \(20\)

Based on the given information:

\(J_1\) \(J_2\)
\(G\) \(P(G|J_1) = \frac{14}{25}\) \(P(G|J_2) = \frac{14}{20}\)
\(B\) \(P(B|J_1) = \frac{11}{25}\) \(P(B|J_2) = \frac{6}{20}\)
\(J_1\) \(J_2\)
\(G\) \(0.56\) \(0.70\)
\(B\) \(0.44\) \(0.30\)

One Ball from Jars Sample Space

Scenario:

Probability tree:

\(\star\) A probability tree is a way to visualize all possible outcomes in the sample space. We look at all the branches of this tree that corresponds to our desired outcome.

One Ball from Jars Example

What is the probability that the ball drawn is \(G\)?

\[ \begin{aligned} P(G) & = P(J_1 \cap G) + P(J_2 \cap G) \\ & = P(J_1) P(G|J_1) + P(J_2) P(G|J_2) \\ & = \left(\frac{1}{2}\right) \left(\frac{14}{25}\right) + \left(\frac{1}{2}\right) \left(\frac{14}{20}\right) \\ & = \frac{63}{100} \\ P(G) & = 0.63 \longleftarrow \text{equivalent to } P(G) = 1 - P(B) \end{aligned} \]

One Ball from Jars in Reverse

With the same set-up and assumptions, we now view the same scenario in reverse.

Scenario:

Probability tree in reverse:

\(\star\) The reverse probability tree is way to visualize all possible outcomes given a sample (i.e. an observation or a data point).

One Ball from Jars in Reverse Example

\[ \begin{aligned} P(J_1|G) & = \frac{P(J_1 \cap G)}{P(G)} \\ & = \frac{P(G|J_1)P(J_1)}{P(G)} \longleftarrow \text{bayes' Rule} \\ & = \frac{P(G|J_1)P(J_1)}{P(J_1) P(G|J_1) + P(J_2) P(G|J_2)} \\ & = \frac{\left(\frac{1}{2}\right) \left(\frac{14}{25}\right)}{\left(\frac{1}{2}\right) \left(\frac{14}{25}\right) + \left(\frac{1}{2}\right) \left(\frac{14}{20}\right)} \\ P(J_1|G) & = \frac{4}{9} \approx 0.444 \end{aligned} \]

\(\star\) This is consistent with the idea that conditional probability is not always symmetric, meaning \(P(G|J_1) \ne P(J_1|G)\), because \(\displaystyle P(G|J_1) = \frac{14}{25} = 0.56\) is not equal to \(\displaystyle P(J_1|G) \approx 0.444\).

Law of Total Probability

Let \(A\) and \(B\) be events in the sample space \(S\). Then,

\[ \begin{aligned} P(B) & = P(B \cap A) + P(B \cap A^c) \\ \text{or} & \\ P(B) & = P(B|A)P(A) + P(B|A^c)P(A^c) \end{aligned} \]

\(\star\) This law decomposes the probability of \(A\) given \(B\) into two pieces, one where \(B\) happens and one where \(B\) does not happen.

Bayes’ Rule

Let \(A\) and \(B\) be events in the sample space \(S\).

\[ \begin{aligned} P(A|B) & = \frac{P(B|A)P(A)}{P(B)} \\ \text{or} & \\ P(A|B) & = \frac{P(B|A)P(A)}{P(B|A)P(A) + P(B|A^c)P(A^c)} \end{aligned} \]

\(\star\) This rule is a simple statement about conditional probabilities that allows the computation of \(P(A|B)\) from \(P(A)\). Due to this rule, events \(A\) and \(B\) are not always symmetric (i.e., \(P(A|B) \ne P(B|A)\)).

Two Balls from Jars

With the same set-up and assumptions, we now draw two balls instead of one.

Scenario:

Probability tree:

\(\star\) The probability tree now have three levels instead of two because we drew two balls instead of one. Computing the probabilities now depends on whether we sampled with or without replacement.

Two Balls from Jars (Sampling With Replacement)

Conditional Probabilities:

Combination \(J_1\) \(J_2\)
\(GG\) \(P(GG|J_1) = \left(\frac{14}{25}\right) \left(\frac{14}{25}\right)\) \(P(GG|J_2) = \left(\frac{14}{20}\right) \left(\frac{14}{20}\right)\)
\(GB\) \(P(GB|J_1) = \left(\frac{14}{25}\right) \left(\frac{11}{25}\right)\) \(P(GB|J_2) = \left(\frac{14}{20}\right) \left(\frac{6}{20}\right)\)
\(BG\) \(P(BG|J_1) = \left(\frac{11}{25}\right) \left(\frac{14}{25}\right)\) \(P(BG|J_2) = \left(\frac{6}{20}\right) \left(\frac{14}{20}\right)\)
\(BB\) \(P(BB|J_1) = \left(\frac{11}{25}\right) \left(\frac{11}{25}\right)\) \(P(BB|J_2) = \left(\frac{6}{20}\right) \left(\frac{6}{20}\right)\)

\(\star\) The notation \(GG\) represents an event where the 1st draw is a \(G\) and the 2nd draw is \(G\). The notations \(GB\), \(BG\), and \(GG\) follows the same notation representation of events.

Example Under Sampling With Replacement

What is the probability that the two balls drawn contain exactly two \(G\)?

\[ \begin{aligned} P(GG) & = P(J_1)P(GG|J_1) + P(J_2)P(GG|J_2) \\ & = \left(\frac{1}{2}\right) \left(\frac{14}{25}\right)^2 + \left(\frac{1}{2}\right) \left(\frac{14}{20}\right)^2 \\ P(GG) & = \frac{2009}{5000} \approx 0.402 \end{aligned} \]

Two Balls from Jars (Sampling Without Replacement)

Conditional Probabilities:

Combination \(J_1\) \(J_2\)
\(GG\) \(P(GG|J_1) = \left(\frac{14}{25}\right) \left(\frac{13}{24}\right)\) \(P(GG|J_2) = \left(\frac{14}{20}\right) \left(\frac{13}{19}\right)\)
\(GB\) \(P(GB|J_1) = \left(\frac{14}{25}\right) \left(\frac{11}{24}\right)\) \(P(GB|J_2) = \left(\frac{14}{20}\right) \left(\frac{6}{19}\right)\)
\(BG\) \(P(BG|J_1) = \left(\frac{11}{25}\right) \left(\frac{14}{24}\right)\) \(P(BG|J_2) = \left(\frac{6}{20}\right) \left(\frac{14}{19}\right)\)
\(BB\) \(P(BB|J_1) = \left(\frac{11}{25}\right) \left(\frac{10}{24}\right)\) \(P(BB|J_2) = \left(\frac{6}{20}\right) \left(\frac{5}{19}\right)\)

\(\star\) The multiplication rule of dependent events was used on determining these probabilities. Notice that the successive factors for each probability has changed.

Example Under Sampling Without Replacement

What is the probability that the two balls drawn contain exactly two \(G\)?

\[ \begin{aligned} P(GG) & = P(J_1)P(GG|J_1) + P(J_2)P(GG|J_2) \\ & = \left(\frac{1}{2}\right) \left(\frac{14}{25}\right) \left(\frac{13}{24}\right) + \left(\frac{1}{2}\right) \left(\frac{14}{20}\right) \left(\frac{13}{19}\right) \\ P(GG) & = \frac{4459}{11400} \approx 0.391 \\ \end{aligned} \]

Two Balls from Jars in Reverse

With the same set-up and assumptions, we now view the same scenario in reverse.

Scenario:

Probability tree in reverse:

\(\star\) You still need to consider whether the balls are sampled with or without replacement and Bayes’ Rule has to be applied to compute the probabilities.

Two Balls from Jars in Reverse Example

Assume that we sample without replacement.

\[ \begin{aligned} P(J_1|GG) & = \frac{P(J_1 \cap GG)}{P(GG)} \\ & = \frac{P(GG|J_1)P(J_1)}{P(GG)} \longleftarrow \text{bayes' Rule} \\ & = \frac{P(GG|J_1)P(J_1)}{P(J_1) P(GG|J_1) + P(J_2) P(GG|J_2)} \\ & = \frac{\left(\frac{1}{2}\right) \left(\frac{14}{25}\right) \left(\frac{13}{24}\right)}{\left(\frac{1}{2}\right) \left(\frac{14}{25}\right) \left(\frac{13}{24}\right) + \left(\frac{1}{2}\right) \left(\frac{14}{20}\right) \left(\frac{13}{19}\right)} \\ P(J_1|GG) & = \frac{19}{49} \approx 0.388 \end{aligned} \]

\(\star\) The complement probability \(P(J_2|GG) = 1- P(J_1|GG) = 1 - 0.388 = 0.612\) indicates that the balls are more likely to have come from \(J_2\).

Disease Testing

Scenario:

A person takes a test for a rare disease. How do we know the probability they actually have the disease given a positive test result?

Information given:

Bayes’ Rule:

\[ \begin{aligned} P(D|+) & = \frac{P(+|D)P(D)}{P(+)} \\ & = \frac{P(+|D)P(D)}{P(+|D)P(D) + P(+|D^c)P(D^c)} \\ & = \frac{(0.99)(0.01)}{(0.99)(0.01) + (0.05)(1-0.01)} \\ & = \frac{0.0099}{0.0594} \\ P(D|+) & \approx 0.167 \end{aligned} \]

Interpretation:

\(\star\) This highlights the importance of considering prior probabilities (prevalence).

A Fundamental Concept in Statistical Thinking

Bayes’ Theorem (or Bayes’ Rule), which provides a formula for updating the probability of a hypothesis based on new evidence, is often described through various colloquialisms in statistics.

Hypothesis statements and data:

Consider hypothesis statements as events and data as an observed evidence.

According to Bayes’ Rule,

\[ \begin{aligned} P(H|E) & = \frac{P(E|H)P(H)}{P(E)} \\ \text{or} & \\ \text{posterior} & = \frac{\text{likelihood} \times \text{prior}}{\text{marginal}}. \end{aligned} \]

\(\star\) In short, learning from data means continuously revising what you think is true as new evidence comes.

Terms explained:

Frequentist vs Bayesian Probabilities (1/2)

Frequentist Perspective:

\(\dagger\) This course only focuses on the statistical methods based on the frequentist perspective of probability.

In inference:

\(\star\) Most statistical methods introduced in high school and early college are primarily based on the frequentist perspective of probability.

Frequentist vs Bayesian Probabilities (2/2)

Bayesian Perspective:

\(\dagger\) There is actually different forms of Bayes’ Theorem but let’s just table that until our next life.

In inference:

\(\star\) By combining frequentist and Bayesian approaches, you can perform more flexible and powerful statistical analyses.