MTH-361A | Spring 2026 | University of Portland
Definition:
\(\star\) The probability distribution of \(\overline{X}_n\) is called the sampling distribution.
Definition:
CLT Conditions:
The Normal Distribution:
The normal distribution is a type of a sampling distribution as long the CLT conditions hold.
\(\star\) With a sufficiently large sample size, the mean of the sampling distribution is approximately equal to the corresponding parameter of the underlying population.
The CLT is very useful because it tell us that the sampling distribution of the sample mean is normal, regardless of the population’s distribution shape.
Sampling distribution of the sample mean:
The CLT z-score:
Under CLT, \(\mu\) is the mean and \(\frac{\sigma^2}{n}\) is the variance.
\(\star\) The sample standard deviation \(s\) is used when the population standard deviation \(\sigma\) is unknown.
\(\star\) The distribution remains the same except that the standard deviations of this normal distribution is multiplied by a factor of \(\frac{1}{\sqrt{n}}\).
Suppose that we want to estimate the mean annual income of the city of Fancyland.
Data:
Here, we show a distribution of \(140\) annual household incomes. The numbers are in thousands.
Let \(\overline{x} = 245\) be the estimate.
\(\dagger\) The sample mean \(\overline{x}\) is known as an unbiased estimator of the population mean \(\mu\) because \(\text{E}(\overline{X}) = \mu\) over repeated sampling.
Sampling distribution:
\(\star\) We can’t take data multiple times due to practical constraints. That is why we assume CLT as long as the conditions hold.
Suppose we want to estimate the variance of the sampling distribution of the mean annual income.
Information from the data:
The unknowns:
Best Estimate:
\(\dagger\) Similar to the sample mean, the sample variance \(s^2\) is also an unbiased estimator of the population variance \(\sigma^2\) because \(\text{Var}(\overline{X}) = \sigma^2\) over repeated sampling.
The mean annual income \(\overline{x} = 245\) is an estimate for an unknown parameter, but how good is this estimate?
Why CLT matters for parameter estimation:
Sampling distribution of the observation:
\(\star\) The \(0.95\) confidence level is the long-run probability that intervals constructed from repeated samples will contain the true population mean \(\mu\).
The mean annual income \(\overline{x} = 245\) is an observation from the data, but how far is this observation from a null value?
Why it matters for hypothesis testing:
Sampling distribution of the null value:
\(\star\) The p-value answers the question: If the true mean were \(\mu = 225\), how likely is it to observe a sample mean this far (or farther) from \(\mu\) just by random chance?
The null hypothesis:
The alternative hypothesis:
The p-value:
Using R:
## [1] 0.1475498
\(\star\) Note the conditional probability notation \(P(Z \ge z | H_0)\) for the p-value, which is not the same as \(P(H_0 | Z \ge z)\).
Interpreting the p-value:
\(\star\) Deciding whether we have sufficient evidence (to “reject” or “fail to reject” the null hypothesis) requires a significance level, a probability that we choose in advance and use to compare with the p-value.