Probablity &
Random Variables

Applied Statistics

MTH-361A | Spring 2025 | University of Portland

February 5, 2025

Objectives

Previously… (1/3)

The guiding principle of statistics is statistical thinking.

Statistical Thinking in the Data Science Life Cycle

Statistical Thinking in the Data Science Life Cycle

Previously… (2/3)

Exploratory Analysis

It is the process of analyzing and summarizing datasets to uncover patterns, trends, relationships, and anomalies before inference.

Inference

It is the process of drawing conclusions about a population based on sample data. This involves using data from a sample to make generalizations, predictions, or decisions about a larger group.

Previously… (3/3)

Types of Inference:

Toy Example

Suppose you have two coins. Which of the two coins are fair?

What is a fair coin? There are only two possible outcomes of each coin: head or tail, but not both. A fair coin means that if you flip it, the chances of getting a head or tail is equally likely.

\(\dagger\) How would you know which coin is fair if they are “similar” in appearance and weight?

Toy Example: Coin Flips

Flip the coin \(6\) times.

Data:

Sample Proportion of Heads:

\(\dagger\) If we flip the coins more and add it into the totals, will the proportion of heads change?

Toy Example: True vs Sample Proportion

Flip the coin \(16\) times.

Data:

Sample Proportion of Heads:

True Proportion of Heads: We don’t know the true proportion of heads for each coin or which one is fair, but we know a fair coin should yield a 0.50 proportion of heads.

\(\star\) Key Idea: The goal of parameter estimation is to determine the true proportion of heads for coins A and B, accounting for uncertainty from random sampling (coin flips).

\(\dagger\) How many coin flips should you do until you are certain which one is the fair coin?

Probability is the Basis for Inference

\(\star\) Key Idea: Probability bridges the gap between sample data and population conclusions.

Toy Example: Which of the two coins are fair?

Once we estimate the true proportion of heads, we can test which coin is fair through hypothesis testing. We can frame the test in two ways:

Way 1

Way 2:

\(\star\) Key Idea: Hypothesis testing involves two opposing statements: the null hypothesis and the alternative hypothesis, which we aim to test.

The P-Value is a Probability

\(\star\) Key Idea: The p-value quantifies how surprising the sample data is under the assumption that the null hypothesis is true.

Probability and Statistics

Probability

Statistics

Basic Probability Definition

Probability is the branch of mathematics that deals with randomness. The likelihood of an outcome happening.

An extent to which an outcome is likely to occur is \[\text{probability} = \frac{\text{number of favorable outcomes}}{\text{total number of outcomes}}.\]

Coin

Fair Coin

Dice

Fair Dice

Standard Deck of Cards

52-Card Deck

Probability Notations (1/2)

We will use specific words for outcomes.

Fair Coin Example:

Probability Notations (2/2)

We will use specific notations for probabilities.

Let \(A\) be an event with a finite sample space \(S\). The probability of \(A\) is \[P(A) = \frac{|A|}{|S|} \longrightarrow P(A) = \frac{\text{number of outcome favorable to } A}{\text{total number of outcomes in } S}.\]

Fair Coin Example:

\[ \begin{aligned} \text{probability of } H & = \frac{1}{2} \longrightarrow P(H) = \frac{1}{2} \\ \text{probability of } T & = \frac{1}{2} \longrightarrow P(T) = \frac{1}{2} \end{aligned} \]

Set Notation

Suppose we have events A and B:

\(A \cap B\)” is the set of all objects in A AND B

\(A \cup B\)” is the set of all objects in A OR B.

Independence

Two events, \(A\) and \(B\), are independent if the occurrence of one does not affect the probability of the other: \[P(A \cap B) = P(A)P(B)\]

If the event \(B\) is dependent on \(A\), then \[P(A \cap B) \ne P(A)P(B)\]

\(\star\) Key Idea: Independent events is when one event happening does not affect the other. Disjoint events is when one event happening prevents the other.

Coin Flips

Suppose we conduct an experiment of flipping fair coins in sequence and record the outcomes.

\(\dagger\) How many possible outcomes are there for three coins and what are the probabilities?

Disjoint and Joint Events

Two events, \(A\) and \(B\), are disjoint (or mutually exclusive) if they cannot occur at the same time: \[P(A \cap B) = 0.\]

Two event, \(A\) and \(B\) are joint if they can happen together: \[P(A \text{ and } B) \ne 0\]

Fair Coin Example:

Union of Events

The union of two events, \(A\) and \(B\), is the event that at least one of them occurs: \[P(A \cup B) = P(A) + P(B) - P(A \cap B)\]

If \(A\) and \(B\) are disjoint, then \[P(A \cup B) = P(A) + P(B)\]

\(\star\) Key Idea: The probability of the union is the sum of individual probabilities minus their intersection (to avoid double-counting).

Joint vs Disjoint Venn Diagram

Drawing Cards

Suppose we conduct an experiment of drawing specific characteristics of a card from a 52-card deck.

\(\dagger\) Can you compute the probability of drawing a face card (Ace, Jack, Queen, King) or a Diamond?

Dice Rolls

Suppose we conduct an experiment of rolling two six-sided dice and sum the outcomes.

\(\dagger\) Can you compute the probability of rolling a sum of 4?

Basic Probability Rules

Rule Formula
Independence \(P(A \cap B) = P(A)P(B)\)
Joint (Union) \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\)
Disjoint \(P(A \cap B) = 0\)
Complement If \(P(A) + P(B) = 1\), then \(1-P(A)=P(B)\).

Probability Axioms

Axiom Statement
\(P(S) = 1\) The sum of the probabilities for all outcomes in the sample space is equal to 1.
\(P \in [0,1]\) Probabilities are always positive and always between \(0\) and \(1\).
\(P(A \cup B) = P(A) + P(B)\) If events A and B are disjoint (mutually exclusive), then their probabilities can be added.

Random Variables

A random variable (r.v) is a numerical outcome of a random experiment. It assigns a number to each possible outcome in a sample space.

In other words, a random variable is a function that maps the sample space into real numbers.

Types:

\(\star\) Key Idea: R.V. provides a way to assign numerical values to outcomes in a sample space, allowing us to analyze and compute probabilities in a structured manner

Probability Functions

A probability function assigns probabilities to outcomes in a sample space.

In other words, a probability function maps the r.v. into the the real numbers between 0 and 1.

Types:

\(\star\) Key Idea: We can define a probability function directly from the sample space, but using a random variable makes it explicit what outcomes we want to compute probabilities for in a given scenario.

Flipping One Coin R.V.

Suppose we conduct an experiment of flipping a fair coin once.

\(\star\) Key Idea: A random variable for a coin toss maps the sample space \(\{H,T\}\) to real values, assigning \(X(H)=1\) and \(X(T)=0\). The probability function \(P(X)\) then defines the probability space.

Flipping Two Coins R.V.

Suppose we conduct an experiment of flipping two fair coins in a sequence.

\(\star\) Key Idea: The PMF \(P(X)\) satisfies the probability axioms, and the collection of all probabilities forms the probability distribution.

Interpreting Probability

Frequentist probability refers to the interpretation of probability based on the long-run frequency of an event occurring in repeated trials or experiments.

Coin Flipping Example

Suppose we conduct an experiment where we repeatedly flip a fair coin (\(P(H) = 0.50\)), tracking the cumulative count of \(H\) and its proportion after each flip.

\(\star\) Key Idea: As the number of flips (samples) increases the proportion of H gets closer and closer to the true proportion of H, which is \(P(H)=0.50\).

Activity: Define a Random Variable and Compute Probabilities

  1. Make sure you have a copy of the W 2/5 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/
Speegle, Darrin and Clair, Bryan. (2021). Probability, statistics, and data: A fresh approach using r. Chapman; Hall/CRC. https://probstatsdata.com/