Hypothesis Testing for Linear Regression

Applied Statistics

MTH-361A | Spring 2026 | University of Portland

Objectives

Case Study I

Consider data births gathered originally from the US Department of Health and Human Services. The births14 data can be found in the openintro R package.

Case Study I: The Linear Model

We want to predict the baby weight based on number of weeks. The population linear model is \[y_{weight} = \beta_0 + \beta_1 x_{weeks} + e\]

The relevant hypotheses for the linear model setting can be written in terms of the population slope parameter.

Here the population refers to a larger population of births in the US.

Let’s set the significance value to be \(\alpha = 0.01\).

Technical Conditions

Case Study 1: Residual Analysis (1/2)

Case Study 1: Residual Analysis (2/2)

Case Study 1: Least Squares Approximation

term estimate SE
\(b_0\) -3.5980 0.5227
\(b_1\) 0.2792 0.0135

The least squares regression model uses the data to find a sample linear fit: \[\hat{y}_{weight} = -3.5980 + 0.2792 \times x_{weeks}.\]

R code:

lm(weight ~ weeks, data = births14)

where the data is stored as a data frame named births14 and weight and weeks are two numerical variables in the data frame.

Hypothesis Testing - Randomization Method (1/2)

Two different permutations of the weight variable with slightly different least squares regression lines.

Hypothesis Testing - Randomization Method (2/2)

Histogram of slopes given different permutations of the weight variable. The vertical red line is at the observed value of the slope, 0.28.

Hypothesis Testing - Theoretical Method (1/2)

term estimate SE
\(b_0\) -3.5980 0.5227
\(b_1\) 0.2792 0.0135

Hypothesis Testing - Theoretical Method (2/2)

Using R to compute the p-value

df <- 998 # degrees of freedom
t <- 20.69 # test statistic
2*(1-pt(t,998))
## [1] 0

Confidence Interval (1/2)

99% Confidence interval for the slope

\[b_1 \pm t_{df}^* \text{SE}_{b_1}\]

Using R to compute the critical t-star

df <- 998 # degrees of freedom
lt <- (1-0.99)/2 # lower tail probability
qt(lt,df)
## [1] -2.580765

\[ \begin{aligned} 0.2792 & \pm 2.5808 \times 0.0135 \end{aligned} \] \[(0.2444,0.3140)\]

We are 99% confident that the true slope is in between 0.2444 and 0.3140. Note that the null value of 0 (no slope) is not within the interval.

Confidence Interval (2/2)