Probability Mass Functions


Previously… (1/3)

Discrete Random Variables and Probability Mass Functions

[PSDR] Definition 3.3

A discrete random variable is a random variable that takes integer values. A discrete random variable is characterized by its probability mass function (PMF). The PMF \(p\) of a random variable \(X\) is given by \[p(x) = P (X=x).\]

[PSDR] Theorem 3.1

Let \(p\) be the probability mass function of \(X\).

  1. \(p(x) \ge 0 \text{ for all } x\).
  2. \(\sum_x p(x) = 1\).

Previously… (2/3)

The Bernoulli Probability Mass Function

Let \(X\) be a Bernoulli random variable with PMF given as

\[P(X=x; p) = p^x (1-p)^{1-x}.\]

[PSDR] Definition 3.15

A Bernoulli trial is an experiment that can result in two outcomes, which we will denote as “success” and “failure.” The probability of a success will be denoted \(p\), and the probability of failure is therefore \(1−p\).

Previously… (3/3)

The Binomial Probability Mass Function

[PSDR] Definition 3.17

A random variable \(X\) is said to be a binomial random variable with parameters \(n\) and \(p\) if

\[P(X = x; n, p) = \binom{n}{x} p^x (1-p)^{(n-x)} \hspace{5px} \text{ for } x = 0,1,2,\cdots, n\]

where \(\binom{n}{x} = \frac{n!}{x!(n-x)!}\). We will sometimes write \(X \sim Binom(n,p)\).

[PSDR] Theorem 3.3

If \(X\) counts the number of successes in \(n\) independent and identically distributed (iid) Bernoulli trials, each with probability of success \(p\), then \(X \sim Binom(n,p)\).

Example 1

The following example scenarios CAN be modeled using a Binomial PMF because the sampling method is independent and the there are only two outcomes.

Example 2 (1/3)

What are the chances that you encounter bots on Twitter?

Twitter bots are complicated but let’s simplify. Say it is known that 5% of all Tweets are spam bots or tweets from a fake account.

If you encounter 40 tweets in a given day, we can compute the probability that a certain number of those tweets are bots using a Binomial PMF. Here, we assume that each tweet encounter is independent and identically distributed Bernoulli trial.

Let \(X\) be a binomial random variable with probability of success \(p = 0.05\) and number of independent trials \(n = 40\).

Example 2 (2/3)

\[\begin{align*} P(X = 3) = & \binom{40}{3} (0.05)^3 (1-0.05)^{(40-3)} \\ = & 0.1851 \\ \end{align*}\]

That is 18.51% chance of encountering three bot tweets out of 40 tweets.

You can use the dbinom function in R to compute the Binomial PMF.

## [1] 0.1851145

Example 2 (3/3)

\[\begin{align*} P(X \le 3) = & \sum_{x = 0}^3 \binom{40}{x} (0.05)^x (1-0.05)^{(40-x)} \\ & - \binom{40}{0} (0.05)^0 (1-0.05)^{(40-0)} \\ & = 0.7333 \end{align*}\]

The summation term above is what’s known as the Cumulative Density Function (CDF).

That is 73.33% chance of encountering at most three bot tweets out of 40 tweets.

You can use the pbinom function in R to compute the Binomial CDF.

pbinom(3,40,0.05) - dbinom(0,40,0.05)
## [1] 0.7333381

The Cumulative Distribution Function (CDF)

[PSDR] Definition 4.4

The cumulative distribution function (cdf) associated with \(X\) is the function

\[F(x)= P (X ≤ x).\]

For a discrete random variable \(X\) with PMF \(P(X = x)\), the the above is written as

\[F(x) = \sum_{n=-\infty}^x P(X = n).\]

Bernoulli CDF

Let \(X \sim Bernoulli(p)\) where \(p\) is the probability of success.

\[P(X=x; p) = p^x (1-p)^{1-x}.\]

\[\begin{align*} F(x) = & P(X \le x; p) \\ = & \sum_{n=0}^x p^x (1-p)^{1-x} \\ \end{align*}\]

\[ F(x) = \begin{cases} 0 & \text{ if } x < 0 \\ 1-p & \text{ if } 0 \le x < 1 \\ 1 & \text{ if } x \ge 1 \end{cases} \]

Binomial CDF

Let \(X \sim Binomial(np)\) where \(p\) is the probability of success and \(n\) be the number of trials \(n\).

\[P(X = x; n, p) = \binom{n}{x} p^x (1-p)^{(n-x)} \hspace{5px} \text{ for } x = 0,1,2,\cdots, n\]

Use dbinom in R.

\[\begin{align*} F(x) = & P(X \le x; p) \\ = & \sum_{k=0}^x \binom{n}{k} p^k (1-p)^{(n-k)} \\ \end{align*}\]

Use pbinom in R.

Example 3 (1/3)

Now, let’s ask a slightly different question from the questions in Examples 1 and 2.

Consider a scenario where you look at multiple tweets. You don’t know how many tweets you want to look at. What is the probability of observing a 1st bot on the 3rd tweet that you see?

Again, it is known that 5% of all Tweets are spam bots or tweets from a fake account.

\[\begin{align*} P(\text{1 bot tweet until 1st tweet}) & = 0.05 = \frac{1}{20} \\ P(\text{2 bot tweets until 1st tweet}) & = \frac{19}{20}\frac{1}{20} \\ P(\text{3 bot tweets until 1st tweet}) & = \frac{19}{20}\frac{19}{20}\frac{1}{20} = \frac{19^2}{20^3} \end{align*}\]

The answer is \(P(\text{3 bot tweets until 1st tweet}) = 0.045\). There is a 4.50% probability of observing a bot tweet in the 3rd tweet that you see. In other words, the probability of observing failures in the 1st and 2nd tweet that you see, and a success in the 3rd tweet.

Geometric Random Variable (2/3)

What is the probability of observing a 1st bot tweet on \((x+1)\)-th tweet that you see?

\[\begin{align*} P(\text{1 bot tweet until 1st tweet}) & = 0.05 = \frac{1}{20} \\ P(\text{2 bot tweets until 1st tweet}) & = \frac{19}{20}\frac{1}{20} \\ P(\text{3 bot tweets until 1st tweet}) & = \frac{19}{20}\frac{19}{20}\frac{1}{20} = \frac{19^2}{20^3} \cdots & \\ P(\text{x bot tweets until 1st tweet}) & = \left(\frac{19}{20}\right)^{x-1} \frac{1}{20} \end{align*}\]

Geometric Random Variable (3/3)

In this example, the Geometric PMF is given as

\[P(X = x) = \left(\frac{19}{20}\right)^{x-1} \left(\frac{1}{20}\right)\]

where \(X\) is a Geometric random variable, \(x\) is the number of failures because the \((x+1)\)-th iteration is the first success with probability \(p = \frac{1}{20}\).

The geometric PMF models the probability of the number of failures can you have before getting a 1st success. In this example, observing a bot tweet is a “success”.

The Geometric Probability Mass Function

[PSDR] Definition 3.22

A random variable \(X\) is said to be a geometric random variable with parameter \(p\) if

\[P(X = x) = (1-p)^{x-1} p, \hspace{20px} x = 1, 2, 3, \cdots\]

[PSDR] Theorem 3.5

Let \(X\) be the random variable that counts the number of failures before the first success in a Bernoulli process with probability of success \(p\). Then \(X\) is a geometric random variable.

Notice that the sample space is now infinite!

Geometric CDF

Let \(X \sim Geometric(p)\) where \(p\) is the probability of success.

\[P(X = x; p) = (1-p)^{x-1} p, \hspace{20px} x = 1, 2, 3, \cdots\]

Use dgeom in R. Note that the input for dgeom is \((x-1)\). So if \(x=3\), then you need dgeom(3-1).

\[\begin{align*} F(x) = & P(X \le x; p) \\ = & \sum_{k=0}^x (1-p)^k p \\ \end{align*}\]

\[ F(x) = \begin{cases} 1-(1-p)^{x} & \text{ if } x \ge 1 \\ 0 & \text{ if } x < 1 \\ \end{cases} \]

Use pgeom in R.

In your mini-assignment today, you are going to verify that the Geometric PMF is a valid PMF and the form of the Geometric CDF closed form is \(F(x)\), shown above.


Mini-Assignment: Probability Mass Functions

Back to Tentative Topics Schedule.