Hypothesis Testing for Means

Elementary Statistics

MTH-161D | Spring 2025 | University of Portland

April 4, 2025

Objectives

Develop an understanding of hypothesis testing for means
Know how to compute the test statistic for means
Activity: Conduct a Hypothesis Test for Means

Previously… (1/2)

The \(t\)-distribution

The larger the degrees of freedom the more closely the \(t\)-distribution resembles the standard normal distribution.

Previously… (2/2)

Confidence Intervals for One Mean

\[ \begin{aligned} \bar{x} \ &\pm\ t^*_{df} \times \frac{s}{\sqrt{n}} \end{aligned} \]

Confidence Intervals for Difference of Two Means

\[ \begin{aligned} \bar{x}_1 - \bar{x}_2 \ &\pm\ t^*_{df} \times \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \end{aligned} \]

Case Study I

Every year, the US releases to the public a large data set containing information on births recorded in the country. This data set has been of interest to medical researchers who are studying the relation between habits and practices of expectant mothers and the birth of their children. We will work with a random sample of 1,000 cases from the data set released in 2014.

Here are four examples in the data set.

fage	mage	weeks	visits	weight	sex	habit
34	34	37	14	6.96	male	nonsmoker
36	31	41	12	8.86	female	nonsmoker
37	36	37	10	7.51	female	nonsmoker
NA	16	38	NA	6.19	male	nonsmoker

Case Study I: Baby Weights - Smoker vs Non-Smoker

We would like to know, is there convincing evidence that newborns from mothers who smoke have a different average birth weight than newborns from mothers who don’t smoke?

Here is the summary statistics for the dataset.

habit	n	Mean	SD
nonsmoker	867	7.269873	1.232846
smoker	114	6.677193	1.596645

Case Study I: CLT Conditions

Conditions:

The data come from a simple random sample, the observations are independent, both within and between samples.
Both groups over 30 observations, we inspect the data for any particularly extreme outliers and find none.

Since both conditions are satisfied, the difference in sample means may be modeled using a \(t\)-distribution.

Case Study I: Examining the Distributions (1/2)

The top panel represents birth weights for infants whose mothers smoked during pregnancy. The bottom panel represents the birth weights for infants whose mothers who did not smoke during pregnancy.

Case Study I: Examining the Distributions (2/2)

Case Study I: One Sample t-test (1/4)

Consider one group (smoking) from the data. It is known that a newborn baby has an average weight of \(7.5\) lbs. We want to test whether the average weight for the smoking group is less than the average using a one sample t-test.

Is the data (smoking group) a convincing evidence to support the claim of the average weight to be less than \(7.5\) lbs?

Case Study I: One Sample t-test (2/4)

Null Hypothesis \(H_0\): The average weight of the smoking group is \(7.5\) lbs. \[\mu = 7.5\]
Alternative Hypothesis \(H_A\): The average weight of the smoking group is less than \(7.5\) lbs. \[\mu < 7.5\]
The null value is \(\mu_0 = 7.5\). The point-estimate is \(\bar{x} = 6.68\) and the sample standard deviation is \(s = 1.5966\).
We set the significance value \(\alpha = 0.01\).

Case Study I: One Sample t-test (3/4)

Compute the standard error \[ \begin{aligned} SE & = \frac{s}{\sqrt{n}} \\ & = \frac{1.5966}{\sqrt{114}} \\ SE & = 0.1495 \end{aligned} \]
Compute the T statistic \[ \begin{aligned} t & = \frac{\bar{x} - \mu_0}{SE} \\ & = \frac{6.68 - 7.5}{0.1495} \\ t & = -5.4850 \end{aligned} \]

Case Study I: One Sample t-test (4/4)

Degrees of freedom is \(df = n - 1 = 114 - 1 = 113\).
The p-value is \(1.27 \times 10^{-07} \approx 0\).

Using R to find the p-value

df <- 113 # degrees of freedom
t <- -5.48495 # test statistic
pt(t,df) # p-value

## [1] 1.278671e-07

Conclusions:

Since the p-value is less than significance value of \(0.01\) (the p-value is really small), we can conclude that the data is a strong evidence that the average weights for the smoking group is not equal to \(7.5\) lbs.
Since the T statistic is negative, we can say that the average weights is less than the null value.

Case Study I: Two Sample t-test (1/4)

habit	n	Mean	SD
nonsmoker	867	7.269873	1.232846
smoker	114	6.677193	1.596645

Is there a difference in weight means between the smoking group and nonsmoking group?

Case Study I: Two Sample t-test (2/4)

Null Hypothesis \(H_0\): There is no difference in means between the smoking and nonsmoking groups. \[\mu_{smoking} = \mu_{nonsmoking}\]
Null Hypothesis \(H_A\): There is a significant difference in means between the smoking and nonsmoking groups. In particular the smoking group weights is less than the nonsmoking group weights. \[\mu_{smoking} < \mu_{nonsmoking}\]
The null value is \(\mu_0 = 0\). The point-estimate is \(\bar{x}_{smoking} - \bar{x}_{nonsmoking} = -0.5927\) and the sample standard deviations are \(s_{smoking} = 1.5966\) and \(s_{nonsmoking} = 1.2328\).
We set the significance value \(\alpha = 0.01\).

Case Study I: Two Sample t-test (3/4)

Compute the standard error \[ \begin{aligned} SE & = \sqrt{\frac{s_{smoking}^2}{n_{smoking}} + \frac{s_{nonsmoking}^2}{n_{nonsmoking}}} \\ & = \sqrt{\frac{1.5966^2}{114} + \frac{1.2328^2}{867}} \\ SE & = 0.1553 \end{aligned} \]
Compute the T statistic \[ \begin{aligned} t & = \frac{\bar{x}_{smoking} - \bar{x}_{nonsmoking} - \mu_0}{SE} \\ & = \frac{-0.5927 - 0}{0.1553} \\ t & = -3.8165 \end{aligned} \]

Case Study I: Two Sample t-test (4/4)

Degrees of freedom is \(df = min(n_{smoking} - 1,n_{nonsmoking} - 1) = 114 - 1 = 113\).
The p-value is \(0.0001\).

Using R to compute the p-value

df <- 113 # degrees of freedom
t <- -3.8165 # test statistic
pt(t,df) # p-value

## [1] 0.000110671

Conclusions:

Since the p-value is less than significance of \(0.01\) (the p-value is really small), we can conclude there is is a strong evidence that there is a difference in weights between nonsmoking and smoking groups.
Since the T statistic is negative, by the order of how we computed the difference, we can say that the average weights is less in the smoking group than in the nonsmoking group.

Activity: Conduct a Hypothesis Test for Means

Make sure you have a copy of the F 4/4 Worksheet. This will be handed out physically. This worksheet will be available on Moodle after class.
Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
Get together with another student.
Discuss your results.
Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/