MTH-361A | Spring 2025 | University of Portland
February 10, 2025
Interpreting Probability
Frequentist probability refers to the interpretation of probability based on the long-run frequency of an event occurring in repeated trials or experiments.
The Expected Value
For a discrete r.v. \(X\), the expected value is given by \[\text{E}(X) = \sum_{i=1}^{n} x_i P(X = x_i), \ \text{ for } \ i = 1,2, \cdots n\] where \(P(X = x_i)\) represents the PMF of \(X\) evaluated at \(x_i\).
The Variance
For a r.v. \(X\), the variance is given by \[\text{Var}(X) = \text{E}\left(X^2 \right) - \left( \text{E}(X) \right)^2\] where \(\text{E}\left(X^2 \right) = \sum_{i=1}^{n} x_i^2 P(X = x_i)\) is the 2nd raw moment for a discrete r.v..
A Bernoulli r.v. represents a single experiment with two possible outcomes: “success” (\(X=1\)) with probability \(p\) and failure (\(X=0\)) with probability \(1-p\). We denote this r.v. and its PMF as \[ \begin{aligned} \text{R.V. } & \longrightarrow X \sim \text{Bern}(p) \\ \text{PMF } & \longrightarrow P(X = x) = p^x (1-p)^{1-x}, \ \ x \in \{0,1\} \end{aligned}. \]
We have shown that the expected value of \(X\) is \(\text{E}(X) = p\) and the variance of \(X\) is \(\text{Var}(X) = p(1-p)\) using the definition of expected value and variance respectively.
The Bernoulli PMF can be visualized using the vertical line plot because the Bernoulli r.v. is discrete.
Example: Suppose that \(p=0.60\), meaning the the “success” probability is \(P(X=1)=0.60\) and the “failure” probability is \(P(X=0)=0.40\).
# set Bernoulli parameter
p <- 0.6
# set PMF of the Bernoulli r.v.
pmf <- tibble(x=c("0","1"),
p=c(1-p,p))
# plot the Bernoulli distribution and store it into a R variable
p1 <- ggplot(pmf,aes(x=x,y=p)) +
geom_point(size=3) + # size here is defined for all points
geom_segment(aes(x=x,xend=x,
y=c(0,0),yend=p)) + # draws a line between two defined points
ggtitle(paste("Bernoulli PMF (p=",p,")",sep="")) + # sets the title of the plot
ylim(0,1) + # set limits of the y-axis for viewing
scale_x_discrete(labels=c("0"="0","1"="1")) +
theme_minimal() # set theme of entire plot
# display plot
p1
A Bernoulli trial is a random experiment with two
possible outcomes: success (1) or failure (0). The following R code
sequence uses the tidyverse
package.
# set hyperparameters
set.seed(42) # for reproducibility
n <- 10 # number of trials
# set parameter and PMF of the Bernoulli r.v
X <- c("0","1") # outcomes ("0"="failure","1"="success")
p <- 0.6 # probability of success
bern_pmf <- c(1-p,p)
# simulate n Bernoulli trials
samples <- sample(X,size=n,prob=bern_pmf,replace=TRUE)
# convert samples into tibble form and compute proportions
samples_tib <- tibble(X=samples) %>%
group_by(X) %>%
summarise(count=n(),proportion=count/n)
# view tibble
samples_tib
## # A tibble: 2 × 3
## X count proportion
## <chr> <int> <dbl>
## 1 0 7 0.7
## 2 1 3 0.3
The Bernoulli trials simulations are considered nominal categorical data. A bar plot that shows the proportion of “success” and “failure” outcomes are most appropriate.
Here, we are using the geom_col()
function instead of
the geom_bar()
because the data we are using is already
summarized instead of the actual data.
# create the Bernoulli trials plot and store it into a R variable
p2 <- ggplot(samples_tib,aes(x=X,y=proportion)) +
geom_col(fill="#009159") +
ylim(0,1) + # set limits of the y-axis for viewing
ggtitle(paste("Bernoulli Trials ","(n =",n,", p=",p,")",sep="")) +
theme_minimal() # set theme of entire plot
# display plot
p2
We can compare the Bernoulli PMF and the Trials by plotting the plots side-by-side for easy comparison.
Load Packages
We using the the gridExtra
package so that we can use
the grid.arrange()
function, which allows us to make
subplots.
Plot the Bernoulli PMF and the trials
\(\star\) Key Idea: Since we only have 10 trials the proportions of the samples is not quite the same as the parameters of the Bernoulli r.v. due to sampling variability.
Suppose we increase the number of Bernoulli Trials and compare it to the Bernoulli PMF.
Let \(p=0.60\) (“success” probability) and \(n=1000\) (number of Bernoulli Trials).
\(\star\) Key Idea: As we increase the number of Bernoulli trials from \(n=10\) to \(n=1000\), the proportion of \(H\) and \(T\) are close to the Bernoulli PMF, which shows the frequentist interpretation of Probability.
The Law of Large Numbers states that as the number of trials in a random experiment increases, the sample mean approaches the expected value.
Example Bernoulli Trials Simulation
Let \(p=0.60\) be the “success” probability of a Bernoulli r.v. \(X\), where \(\text{E}(X) = p\).
\(\star\) Key Idea: By the Law of Large Numbers, as the number of Bernoulli trials increases, the sample proportion of “success” converges to the expected value.
.Rmd
file by replacing [name]
with your name
using the format [First name][Last initial]
. Then, open the
.Rmd
file.