Hypothesis Testing for Linear Regression

Elementary Statistics

MTH-161D | Spring 2025 | University of Portland

April 16, 2025

Objectives

Previously…

Linear Regression

\[ y = \beta_0 + \beta_1 x + \epsilon \]

Case Study I

Consider data births gathered originally from the US Department of Health and Human Services. The births14 data can be found in the openintro R package.

Case Study I: The Linear Model

We want to predict the baby weight based on number of weeks. The population linear model is \[y_{weight} = \beta_0 + \beta_1 x_{weeks} + e\]

The relevant hypotheses for the linear model setting can be written in terms of the population slope parameter.

Here the population refers to a larger population of births in the US.

Let’s set the significance value to be \(\alpha = 0.01\).

Technical Conditions

Case Study 1: Residual Analysis (1/2)

Case Study 1: Residual Analysis (2/2)

Case Study 1: Least Squares Approximation

term estimate SE
\(b_0\) -3.5980 0.5227
\(b_1\) 0.2792 0.0135

The least squares regression model uses the data to find a sample linear fit: \[\hat{y}_{weight} = -3.5980 + 0.2792 \times x_{weeks}.\]

Hypothesis Testing - Randomization Method (1/2)

Two different permutations of the weight variable with slightly different least squares regression lines.

Hypothesis Testing - Randomization Method (2/2)

Histogram of slopes given different permutations of the weight variable. The vertical red line is at the observed value of the slope, 0.28.

Hypothesis Testing - Theoretical Method (1/2)

term estimate SE
\(b_0\) -3.5980 0.5227
\(b_1\) 0.2792 0.0135

Hypothesis Testing - Theoretical Method (2/2)

Using R to compute the p-value

df <- 998 # degrees of freedom
t <- 20.69 # test statistic
2*(1-pt(t,998))
## [1] 0

Confidence Interval (1/2)

99% Confidence interval for the slope

\[b_1 \pm t_{df}^* \text{SE}_{b_1}\]

Using R to compute the critical t-star

df <- 998 # degrees of freedom
lt <- (1-0.99)/2 # lower tail probability
qt(lt,df)
## [1] -2.580765

\[ \begin{aligned} 0.2792 & \pm 2.5808 \times 0.0135 \end{aligned} \] \[(0.2444,0.3140)\]

We are 99% confident that the true slope is in between 0.2444 and 0.3140. Note that the null value of 0 (no slope) is not within the interval.

Confidence Interval (2/2)

Activity: Conduct a Hypothesis Test for Linear Regression

  1. Make sure you have a copy of the W 4/16 Worksheet. This will be handed out physically. This worksheet will be available on Moodle after class.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/