Inference for Difference of Two Means

Elementary Statistics

MTH-161D | Spring 2025 | University of Portland

April 2, 2025

Objectives

Develop an understanding on computing confidence intervals for difference in means
Know how to determine the degrees of freedom for two means
Activity: Determine Confidence Intervals for Difference in Means

Previously… (1/2)

The \(t\)-distribution

The larger the degrees of freedom the more closely the \(t\)-distribution resembles the standard normal distribution.

Previously… (2/2)

Confidence Intervals for One Mean

\[ \begin{aligned} \bar{x} \ &\pm\ t^*_{df} \times \frac{s}{\sqrt{n}} \end{aligned} \]

\[ \begin{aligned} \bar{x} & \longrightarrow \text{sample mean (point estimate)} \\ s & \longrightarrow \text{sample standard deviation} \\ n & \longrightarrow \text{sample size} \\ t^*_{df} & \longrightarrow \text{critical value (t-distribution with degrees of freedom } df \text{)} \end{aligned} \]

Case Study I: Fuel Efficiency in the City

The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 20.6. Consider the research study described below.

Each year the US Environmental Protection Agency (EPA) releases fuel economy data on cars manufactured in that year. Below are summary statistics on fuel efficiency (in miles/gallon) from random samples of cars with manual and automatic transmissions manufactured in 2021. Do these data provide strong evidence of a difference between the average fuel efficiency of cars with manual and automatic transmissions in terms of their average city mileage? US DOE EPA 2021

We will compute the 95% confidence interval for the true difference in means \(\mu_{automatic} - \mu_{manual}\).

CITY	Mean	SD	n
Automatic	17.44	3.44	25
Manual	22.68	4.58	25

Case Study I: Conditions

Conditions.
- Independence (extended). The data are independent within and between the two groups, e.g., the data come from independent random samples or from a randomized experiment.
- Normality. We need large enough sample size for each group. We check the extreme outliers for each group separately.

Here, we see two outliers in the manual group. However, both groups shows decent distributions with balanced outliers where - in this case with 25 samples each - we can “ignore” the outliers and assume normality of the sampling distribution of the means.

Case Study I: Two-sample t-interval (1/3)

Two-sample t-intervals

\[ \begin{aligned} \text{point estimate} \ &\pm\ t^*_{df} \times SE \\ \bar{x}_1 - \bar{x}_2 \ &\pm\ t^*_{df} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \end{aligned} \]

The margin of error is \(ME = t^*_{df} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}}\) where \(t^*_{df}\) is calculated from a specified percentile on the t-distribution with df degrees of freedom.
The official formula for the degrees of freedom is quite complex and is generally computed using R, so instead you may use the smaller of \(n_1 - 1\) and \(n_2 - 1\) for convenience if \(n_1 \approx n_2\).

Case Study I: Two-sample t-interval (2/3)

Standard Error \[ \begin{aligned} SE & = \sqrt{\frac{s_{automatic}^2}{n_{automatic}} + \frac{s_{manual}^2}{n_{manual}}} \\ & = \sqrt{\frac{3.44^2}{25} + \frac{4.58^2}{25}} \\ SE & = 1.1456 \end{aligned} \]
Degrees of freedom is \(df = 24\).
For a 95% confidence level, we find the the critical \(t^*_{df}\) where the upper tail is equal to 2.5%: \(t^*_{24} = 2.0639.\) The area below \(t^*_{24} = -2.0639\) will also be equal to 2.5%.

Using R to find the critical \(t^*\)

cl <- 0.95 # confidence level
lt <- (1-cl)/2 # lower tail probability
df <- 24 # degrees of freedom
qt(lt, df) # t-star

## [1] -2.063899

A note on the degrees of freedom: Our example shows two equal sample sizes in each group. So, the degrees of freedom is \(25 - 1 = 24\).

Case Study I: Two-sample t-interval (3/3)

The 95% Confidence interval is computed as. \[ \begin{aligned} \bar{x}_{automatic} - \bar{x}_{manual} & \pm ME \\ 17.4 - 22.7 & \pm t^*_{df} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \\ 17.4 - 22.7 & \pm 2.0639 \times 1.1456 \\ -5.3 & \pm 2.3644 \end{aligned} \] \[(-7.6644,-2.9356)\]

Therefore, we are 95% confident that the true difference in mean fuel efficiency (miles/gallon) between automatic and manual cars is between 2.9356 and 7.664 in absolute value.

Note that the values are originally negative because how the order of difference terms are computed, meaning a negative difference indicate that there is more efficiency in cars with manual transmission than automatic transmission.

Degrees of Freedom

One-sample t-interval

\[df = n - 1\]

Two-sample t-interval

Minimum sample size. If \(n_1 \approx n_2\), then \(df = \min{\left(n_1 - 1,n_2 - 1\right)}\). Using this \(df\) yields low statistical power.
Pooled. If \(n_1 \ne n_2\) and \(s_1 = s_2\), then \(df = n_1 + n_2 - 2\).
Welch’s Formula. If \(n_1 \ne n_2\) and \(s_1 \ne s_2\), then \(df = \frac{\left( \frac{s_1^2}{n_1} + \frac{s_2^2}{n_2} \right)^2}{\left(\frac{1}{n_1 - 1}\right) \left( \frac{s_1^2}{n_1} \right)^2 + \left(\frac{1}{n_2 - 1}\right) \left( \frac{s_2^2}{n_2} \right)^2}\). Using this \(df\) is the default.

If the population variance is unknown, use the sample variance \(s^2\). If the population variance is known, use population variance \(\sigma^2\). Most real-world problems involve sample variances, especially for statistical inference.

Activity: Determine Confidence Intervals for Difference in Means

Make sure you have a copy of the W 4/2 Worksheet. This will be handed out physically. This worksheet will be available on Moodle after class.
Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
Get together with another student.
Discuss your results.
Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/