MTH-361A | Spring 2025 | University of Portland
March 24, 2025
Central Limit Theorem (CLT)
CLT says that the sample mean (or sum) of many independent and identically distributed random variables approaches a normal distribution, regardless of the original distribution.
CLT Conditions
When we collect a sufficiently large sample of \(n\) independent observations from a population with mean \(\mu\) and standard deviation \(\sigma,\) the sampling distribution of \(\bar{x}\) will be nearly normal with \[\text{Mean} \longrightarrow \mu \text{ and } \text{Standard Error} \longrightarrow SE = \frac{\sigma}{\sqrt{n}}.\]
Two conditions are required to apply the Central Limit Theorem for a sample mean \(\bar{x}:\)
Independence. The sample observations must be independent. The most common way to satisfy this condition is when the sample is a simple random sample from the population.
Normality. When a sample is small, we also require that the sample observations come from a normally distributed population. We can relax this condition more and more for larger and larger sample sizes. This condition is obviously vague, making it difficult to evaluate, so next we introduce a couple rules of thumb to make checking this condition easier.
Note, it often takes practice to get a sense for whether or not a normal approximation is appropriate.
Consider the four plots provided that come from simple random samples from different populations.
Are the independence and normality conditions met in each case?
Histograms of samples from two different populations.
The first sample has fewer than 30 observations, so we are watching for any clear outliers. With no clear outliers, the normality condition can be reasonably assumed to be met.
The second sample has a sample size greater than 30 and includes an outlier. This is an example of a particularly extreme outlier, so the normality condition would not be satisfied.
Comparison of a \(t\)-distribution and a normal distribution.
The \(t\)-distribution is always centered at zero and has a single parameter: degrees of freedom. The degrees of freedom describes the precise form of the bell-shaped \(t\)-distribution. In general, we’ll use a \(t\)-distribution with \(df = n - 1\) to model the sample mean when the sample size is \(n\).
The larger the degrees of freedom the more closely the \(t\)-distribution resembles the standard normal distribution.
We will identify a confidence interval for the average mercury content in dolphin muscle using a sample of 19 Risso’s dolphins from the Taiji area in Japan.
n | Mean | SD | Min | Max |
---|---|---|---|---|
19 | 4.4 | 2.3 | 1.7 | 9.2 |
Are the independence and normality conditions satisfied for this dataset?
One-sample t-intervals
\[ \begin{aligned} \text{point estimate} \ &\pm\ t^*_{df} \times SE \\ \bar{x} \ &\pm\ t^*_{df} \times \frac{s}{\sqrt{n}} \end{aligned} \]
Using R to find the critical \(t^*\) with \(df=18\)
cl <- 0.95 # confidence level
lt <- (1-cl)/2 # lower tail probability
df <- 18 # degrees of freedom
qt(lt, df) # t-star
## [1] -2.100922
\[ \begin{aligned} \bar{x} \ &\pm\ t^*_{18} \times SE \\ 4.4 \ &\pm\ 2.1009 \times 0.528 \\ 4.4 \ &\pm\ 1.1093 \\ \end{aligned} \] \[(3.29,5.51)\]
We are 95% confident the average mercury content of muscles in Risso’s dolphins is between 3.29 and 5.51 \(\mu\)g/wet gram, which is considered extremely high.
.pdf
file.