Objectives

Develop an understanding of t-distributions
Know how to compute confidence intervals for one mean
Understand the conditions for the Central Limit Theorem (CLT) for sample means.
Activity: Determine Confidence Intervals for One Mean

These slides are derived from Diez et al. (2012).

Previously…

Central Limit Theorem (CLT)

CLT says that the sample mean (or sum) of many independent and identically distributed random variables approaches a normal distribution, regardless of the original distribution.

CLT Conditions

Independence – Sample values must be independent
Identical Distribution – Variables should be from the same distribution
Finite Variance – The population must have a finite variance
Large Sample Size – A larger sample size improves approximation

Central Limit Theorem for the Sample Mean

When we collect a sufficiently large sample of \(n\) independent observations from a population with mean \(\mu\) and standard deviation \(\sigma,\) the sampling distribution of \(\bar{x}\) will be nearly normal with \[\text{Mean} \longrightarrow \mu \text{ and } \text{Standard Error} \longrightarrow SE = \frac{\sigma}{\sqrt{n}}.\]

Evaluating the two conditions required for modeling \(\bar{x}\)

Two conditions are required to apply the Central Limit Theorem for a sample mean \(\bar{x}:\)

Independence. The sample observations must be independent. The most common way to satisfy this condition is when the sample is a simple random sample from the population.
Normality. When a sample is small, we also require that the sample observations come from a normally distributed population. We can relax this condition more and more for larger and larger sample sizes. This condition is obviously vague, making it difficult to evaluate, so next we introduce a couple rules of thumb to make checking this condition easier.

General rule for performing the normality check

Note, it often takes practice to get a sense for whether or not a normal approximation is appropriate.

\(\mathbf{n < 30}:\) If the sample size \(n\) is less than 30 and there are no clear outliers in the data, then we typically assume the data come from a nearly normal distribution to satisfy the condition.
\(\mathbf{n \geq 30}:\) If the sample size \(n\) is at least 30 and there are no particularly extreme outliers, then we typically assume the sampling distribution of \(\bar{x}\) is nearly normal, even if the underlying distribution of individual observations is not.

Normality Assesment (1/2)

Consider the four plots provided that come from simple random samples from different populations.

Are the independence and normality conditions met in each case?

Histograms of samples from two different populations.

Normality Assesment (2/2)

The first sample has fewer than 30 observations, so we are watching for any clear outliers. With no clear outliers, the normality condition can be reasonably assumed to be met.
The second sample has a sample size greater than 30 and includes an outlier. This is an example of a particularly extreme outlier, so the normality condition would not be satisfied.

The t-distribution (1/2)

Comparison of a \(t\)-distribution and a normal distribution.

The \(t\)-distribution is always centered at zero and has a single parameter: degrees of freedom. The degrees of freedom describes the precise form of the bell-shaped \(t\)-distribution. In general, we’ll use a \(t\)-distribution with \(df = n - 1\) to model the sample mean when the sample size is \(n\).

The t-distribution (2/2)

The larger the degrees of freedom the more closely the \(t\)-distribution resembles the standard normal distribution.

Case Study I: Mercury content in Risso’s dolphins

We will identify a confidence interval for the average mercury content in dolphin muscle using a sample of 19 Risso’s dolphins from the Taiji area in Japan.

Summary of mercury content in the muscle of 19 Risso’s dolphins from the Taiji area. Measurements are in micrograms of mercury per wet gram of muscle \((\mu\)g/wet g).
n	Mean	SD	Min	Max
19	4.4	2.3	1.7	9.2

Are the independence and normality conditions satisfied for this dataset?

The observations are a simple random sample, therefore it is reasonable to assume that the dolphins are independent.
The summary statistics do not suggest any clear outliers, with all observations within 3 standard deviations of the mean.
Based on this evidence, the normality condition seems reasonable.

Case Study I: One-sample t-interval (1/2)

One sample t-intervals

\[ \begin{aligned} \text{point estimate} \ &\pm\ t^*_{df} \times SE \\ \bar{x} \ &\pm\ t^*_{df} \times \frac{s}{\sqrt{n}} \end{aligned} \]

We plug in \(s\) and \(n\) into the formula: \(SE = \frac{s}{\sqrt{n}} = \frac{2.3}{\sqrt{19}} = 0.528.\)
The degrees of freedom is easy to calculate: \(df = n-1 = 19-1 = 18.\)
We find the cutoff where the upper tail is equal to 2.5%: \(t^*_{18} = 2.10.\) The area below -2.10 will also be equal to 2.5%.

Using R to find \(t^*\)

qt(0.025, df = 18)

## [1] -2.100922

Case Study I: One-sample t-interval (2/2)