Inference for One Mean

Applied Statistics

MTH-361A | Spring 2025 | University of Portland

March 24, 2025

Objectives

Previously…

Central Limit Theorem (CLT)

CLT says that the sample mean (or sum) of many independent and identically distributed random variables approaches a normal distribution, regardless of the original distribution.

CLT Conditions

Central Limit Theorem for the Sample Mean

When we collect a sufficiently large sample of \(n\) independent observations from a population with mean \(\mu\) and standard deviation \(\sigma,\) the sampling distribution of \(\bar{x}\) will be nearly normal with \[\text{Mean} \longrightarrow \mu \text{ and } \text{Standard Error} \longrightarrow SE = \frac{\sigma}{\sqrt{n}}.\]

Evaluating the two conditions required for modeling \(\bar{x}\)

Two conditions are required to apply the Central Limit Theorem for a sample mean \(\bar{x}:\)

General rule for performing the normality check

Note, it often takes practice to get a sense for whether or not a normal approximation is appropriate.

Normality Assesment (1/2)

Consider the four plots provided that come from simple random samples from different populations.

Are the independence and normality conditions met in each case?

Histograms of samples from two different populations.

Histograms of samples from two different populations.

Normality Assesment (2/2)

The t-distribution (1/2)

Comparison of a $t$-distribution and a normal distribution.

Comparison of a \(t\)-distribution and a normal distribution.

The \(t\)-distribution is always centered at zero and has a single parameter: degrees of freedom. The degrees of freedom describes the precise form of the bell-shaped \(t\)-distribution. In general, we’ll use a \(t\)-distribution with \(df = n - 1\) to model the sample mean when the sample size is \(n\).

The t-distribution (2/2)

The larger the degrees of freedom the more closely the $t$-distribution resembles the standard normal distribution.

The larger the degrees of freedom the more closely the \(t\)-distribution resembles the standard normal distribution.

Case Study I: Mercury content in Risso’s dolphins

We will identify a confidence interval for the average mercury content in dolphin muscle using a sample of 19 Risso’s dolphins from the Taiji area in Japan.

Summary of mercury content in the muscle of 19 Risso’s dolphins from the Taiji area. Measurements are in micrograms of mercury per wet gram of muscle \((\mu\)g/wet g).
n Mean SD Min Max
19 4.4 2.3 1.7 9.2

Are the independence and normality conditions satisfied for this dataset?

Case Study I: One-sample t-interval (1/2)

One-sample t-intervals

\[ \begin{aligned} \text{point estimate} \ &\pm\ t^*_{df} \times SE \\ \bar{x} \ &\pm\ t^*_{df} \times \frac{s}{\sqrt{n}} \end{aligned} \]

Using R to find the critical \(t^*\) with \(df=18\)

cl <- 0.95 # confidence level
lt <- (1-cl)/2 # lower tail probability
df <- 18 # degrees of freedom
qt(lt, df) # t-star
## [1] -2.100922

Case Study I: One-sample t-interval (2/2)

\[ \begin{aligned} \bar{x} \ &\pm\ t^*_{18} \times SE \\ 4.4 \ &\pm\ 2.1009 \times 0.528 \\ 4.4 \ &\pm\ 1.1093 \\ \end{aligned} \] \[(3.29,5.51)\]

We are 95% confident the average mercury content of muscles in Risso’s dolphins is between 3.29 and 5.51 \(\mu\)g/wet gram, which is considered extremely high.

Case Study II

Every year, the US releases to the public a large data set containing information on births recorded in the country. This data set has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children. We will work with a random sample of 1,000 cases from the data set released in 2014.

Here are four examples in the data set.

fage mage weeks visits weight sex habit
34 34 37 14 6.96 male nonsmoker
36 31 41 12 8.86 female nonsmoker
37 36 37 10 7.51 female nonsmoker
NA 16 38 NA 6.19 male nonsmoker

Case Study II: Baby Weights - Smoker vs Non-Smoker

We would like to know, is there convincing evidence that newborns from mothers who smoke have a different average birth weight than newborns from mothers who don’t smoke?

Here is the summary statistics for the dataset.

habit n Mean SD
nonsmoker 867 7.269873 1.232846
smoker 114 6.677193 1.596645

Case Study II: CLT Conditions

Conditions:

Since both conditions are satisfied, the difference in sample means may be modeled using a \(t\)-distribution.

Case Study II: Examining the Distributions (1/2)

The top panel represents birth weights for infants whose mothers smoked during pregnancy. The bottom panel represents the birth weights for infants whose mothers who did not smoke during pregnancy.

The top panel represents birth weights for infants whose mothers smoked during pregnancy. The bottom panel represents the birth weights for infants whose mothers who did not smoke during pregnancy.

Case Study II: Examining the Distributions (2/2)

Case Study II: One Sample t-test (1/4)

Consider one group (smoking) from the data. It is known that a newborn baby has an average weight of \(7.5\) lbs. We want to test whether the average weight for the smoking group is less than the average using a one sample t-test.

Is the data (smoking group) a convincing evidence to support the claim of the average weight to be less than \(7.5\) lbs?

Case Study II: One Sample t-test (2/4)

Case Study II: One Sample t-test (3/4)

Case Study II: One Sample t-test (4/4)

Using R to find the p-value

df <- 113 # degrees of freedom
t <- -5.48495 # test statistic
pt(t,df) # p-value
## [1] 1.278671e-07

Conclusions:

Activity: Determine Confidence Intervals for One Mean

  1. Make sure you have a copy of the M 3/24 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/
Speegle, Darrin and Clair, Bryan. (2021). Probability, statistics, and data: A fresh approach using r. Chapman; Hall/CRC. https://probstatsdata.com/