Inference for Difference of Two Means

Applied Statistics

MTH-361A | Spring 2025 | University of Portland

March 26, 2025

Objectives

Previously… (1/2)

The \(t\)-distribution

The larger the degrees of freedom the more closely the $t$-distribution resembles the standard normal distribution.

The larger the degrees of freedom the more closely the \(t\)-distribution resembles the standard normal distribution.

Previously… (2/2)

Confidence Intervals for One Mean

\[ \begin{aligned} \bar{x} \ &\pm\ t^*_{df} \times \frac{s}{\sqrt{n}} \end{aligned} \]

\[ \begin{aligned} \bar{x} & \longrightarrow \text{sample mean (point estimate)} \\ s & \longrightarrow \text{sample standard deviation} \\ n & \longrightarrow \text{sample size} \\ t^*_{df} & \longrightarrow \text{critical value (t-distribution with degrees of freedom } df \text{)} \end{aligned} \]

Case Study I: Fuel Efficiency in the City

The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 20.6. Consider the research study described below.

Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2021. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? US DOE EPA 2021

We will compute the 95% confidence interval for the true difference in means \(\mu_{automatic} - \mu_{manual}\).

CITY Mean SD n
Automatic 17.44 3.44 25
Manual 22.68 4.58 25

Case Study I: Conditions

Here, we see two outliers in the manual group. However, both groups shows decent distributions with balanced outliers where - in this case with 25 samples each - we can “ignore” the outliers and assume normality of the sampling distribution of the means.

Case Study I: Two-sample t-interval (1/3)

Two-sample t-intervals

\[ \begin{aligned} \text{point estimate} \ &\pm\ t^*_{df} \times SE \\ \bar{x}_1 - \bar{x}_2 \ &\pm\ t^*_{df} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \end{aligned} \]

Case Study I: Two-sample t-interval (2/3)

Using R to find the critical \(t^*\)

cl <- 0.95 # confidence level
lt <- (1-cl)/2 # lower tail probability
df <- 24 # degrees of freedom
qt(lt, df) # t-star
## [1] -2.063899

Case Study I: Two-sample t-interval (3/3)

Therefore, we are 95% confident that the true difference in mean fuel efficiency (miles/gallon) between automatic and manual cars is between 2.9356 and 7.664 in absolute value.

Note that the values are originally negative because how the order of difference terms are computed, meaning a negative difference indicate that there is more efficiency in cars with manual transmission than automatic transmission.

Degrees of Freedom

One-sample t-interval

\[df = n - 1\]

Two-sample t-interval

If the population variance is unknown, use the sample variance \(s^2\). If the population variance is known, use population variance \(\sigma^2\). Most real-world problems involve sample variances, especially for statistical inference.

Case Study II

Every year, the US releases to the public a large data set containing information on births recorded in the country. This data set has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children. We will work with a random sample of 1,000 cases from the data set released in 2014.

Here are four examples in the data set.

fage mage weeks visits weight sex habit
34 34 37 14 6.96 male nonsmoker
36 31 41 12 8.86 female nonsmoker
37 36 37 10 7.51 female nonsmoker
NA 16 38 NA 6.19 male nonsmoker

Case Study II: Baby Weights - Smoker vs Non-Smoker

We would like to know, is there convincing evidence that newborns from mothers who smoke have a different average birth weight than newborns from mothers who don’t smoke?

Here is the summary statistics for the dataset.

habit n Mean SD
nonsmoker 867 7.269873 1.232846
smoker 114 6.677193 1.596645

Case Study II: CLT Conditions

Conditions:

Since both conditions are satisfied, the difference in sample means may be modeled using a \(t\)-distribution.

Case Study II: Examining the Distributions (1/2)

The top panel represents birth weights for infants whose mothers smoked during pregnancy. The bottom panel represents the birth weights for infants whose mothers who did not smoke during pregnancy.

The top panel represents birth weights for infants whose mothers smoked during pregnancy. The bottom panel represents the birth weights for infants whose mothers who did not smoke during pregnancy.

Case Study II: Examining the Distributions (2/2)

Case Study II: Two Sample t-test (1/4)

habit n Mean SD
nonsmoker 867 7.269873 1.232846
smoker 114 6.677193 1.596645

Is there a difference in weight means between the smoking group and nonsmoking group?

Case Study II: Two Sample t-test (2/4)

Case Study II: Two Sample t-test (3/4)

Case Study II: Two Sample t-test (4/4)

Using R to compute the p-value

df <- 113 # degrees of freedom
t <- -3.8165 # test statistic
pt(t,df) # p-value
## [1] 0.000110671

Conclusions:

Activity: Determine Confidence Intervals for Difference in Means

  1. Make sure you have a copy of the W 3/26 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/
Speegle, Darrin and Clair, Bryan. (2021). Probability, statistics, and data: A fresh approach using r. Chapman; Hall/CRC. https://probstatsdata.com/