Central Limit Theorem

Applied Statistics

MTH-361A | Spring 2026 | University of Portland

Objectives

The Central Limit Theorem (CLT)

Definition:

\(\star\) The probability distribution of \(\overline{X}_n\) is called the sampling distribution.

The Sampling Distribution

Definition:

CLT Conditions:

The Normal Distribution:

The normal distribution is a type of a sampling distribution as long the CLT conditions hold.

\(\star\) With a sufficiently large sample size, the mean of the sampling distribution is approximately equal to the corresponding parameter of the underlying population.

The CLT Formula

The CLT is very useful because it tell us that the sampling distribution of the sample mean is normal, regardless of the population’s distribution shape.

Sampling distribution of the sample mean:

The CLT z-score:

Under CLT, \(\mu\) is the mean and \(\frac{\sigma^2}{n}\) is the variance.

\(\star\) The sample standard deviation \(s\) is used when the population standard deviation \(\sigma\) is unknown.

The Sampling Distribution Under CLT

\(\star\) The distribution remains the same except that the standard deviations of this normal distribution is multiplied by a factor of \(\frac{1}{\sqrt{n}}\).

Mean Annual Household Income (1/2)

Suppose that we want to estimate the mean annual income of the city of Fancyland.

Data:

Here, we show a distribution of \(140\) annual household incomes. The numbers are in thousands.

Let \(\overline{x} = 245\) be the estimate.

\(\dagger\) The sample mean \(\overline{x}\) is known as an unbiased estimator of the population mean \(\mu\) because \(\text{E}(\overline{X}) = \mu\) over repeated sampling.

Sampling distribution:

\(\star\) We can’t take data multiple times due to practical constraints. That is why we assume CLT as long as the conditions hold.

Mean Annual Household Income (2/2)

Suppose we want to estimate the variance of the sampling distribution of the mean annual income.

Information from the data:

The unknowns:

Best Estimate:

\(\dagger\) Similar to the sample mean, the sample variance \(s^2\) is also an unbiased estimator of the population variance \(\sigma^2\) because \(\text{Var}(\overline{X}) = \sigma^2\) over repeated sampling.

Parameter Estimation

The mean annual income \(\overline{x} = 245\) is an estimate for an unknown parameter, but how good is this estimate?

Why CLT matters for parameter estimation:

Sampling distribution of the observation:

\(\star\) The \(0.95\) confidence level is the long-run probability that intervals constructed from repeated samples will contain the true population mean \(\mu\).

Hypothesis Testing (1/3)

The mean annual income \(\overline{x} = 245\) is an observation from the data, but how far is this observation from a null value?

Why it matters for hypothesis testing:

Sampling distribution of the null value:

\(\star\) The p-value answers the question: If the true mean were \(\mu = 225\), how likely is it to observe a sample mean this far (or farther) from \(\mu\) just by random chance?

Hypothesis Testing (2/3)

The null hypothesis:

The alternative hypothesis:

The p-value:

Using R:

1-pnorm(1.047,0,1)
## [1] 0.1475498

\(\star\) Note the conditional probability notation \(P(Z \ge z | H_0)\) for the p-value, which is not the same as \(P(H_0 | Z \ge z)\).

Hypothesis Testing (3/3)

Interpreting the p-value:

\(\star\) Deciding whether we have sufficient evidence (to “reject” or “fail to reject” the null hypothesis) requires a significance level, a probability that we choose in advance and use to compare with the p-value.