Hypothesis Testing for Linear Regression

Applied Statistics

MTH-361A | Spring 2026 | University of Portland

Objectives

Develop an understanding of hypothesis testing for linear regression
Know the procedural steps for inference for linear regression
Know how to conclude the hypothesis test for linear regression

Case Study I

Consider data births gathered originally from the US Department of Health and Human Services. The births14 data can be found in the openintro R package.

Case Study I: The Linear Model

We want to predict the baby weight based on number of weeks. The population linear model is \[y_{weight} = \beta_0 + \beta_1 x_{weeks} + e\]

The relevant hypotheses for the linear model setting can be written in terms of the population slope parameter.

Here the population refers to a larger population of births in the US.

\(H_0: \beta_1= 0\), there is no linear relationship between weight and weeks.
\(H_A: \beta_1 \ne 0\), there is some linear relationship between weight and weeks.

Let’s set the significance value to be \(\alpha = 0.01\).

Technical Conditions

Linearity. The scatterplot of the explanatory and response must be nearly linear.
Independent Observations. The samples must be independent.
Normally Distributed Residuals. The errors must show a nearly normal distribution.
Constant or equal variability. The error must exhibit homoscedasticity.

Case Study 1: Residual Analysis (1/2)

The residuals appear approximately normal, supporting the normality assumption.
The residual plot shows roughly “constant” variance, suggesting homoskedasticity.
Slight clustering near the center may require further investigation but otherwise the residuals are independent of the model (no pattern).

Case Study 1: Residual Analysis (2/2)

Case Study 1: Least Squares Approximation

The least squares estimates of the intercept and slope are given in the estimate column

term	estimate	SE
\(b_0\)	-3.5980	0.5227
\(b_1\)	0.2792	0.0135

The least squares regression model uses the data to find a sample linear fit: \[\hat{y}_{weight} = -3.5980 + 0.2792 \times x_{weeks}.\]

R code:

lm(weight ~ weeks, data = births14)

where the data is stored as a data frame named births14 and weight and weeks are two numerical variables in the data frame.

Hypothesis Testing - Randomization Method (1/2)

Two different permutations of the weight variable with slightly different least squares regression lines.

Hypothesis Testing - Randomization Method (2/2)

Histogram of slopes given different permutations of the weight variable. The vertical red line is at the observed value of the slope, 0.28.

Hypothesis Testing - Theoretical Method (1/2)

The least squares estimates of the intercept and slope are given in the estimate column

term	estimate	SE
\(b_0\)	-3.5980	0.5227
\(b_1\)	0.2792	0.0135

Computing the test statistic. \[t = \frac{b_1 - \text{null value}}{SE} = \frac{0.2792 - 0}{0.0135} = 20.69\]
Degrees of Freedom: \(df = n - k - 1\), where \(n\) is the sample size and \(k\) is the number of predictors. So, in this case, \(df = 1000 - 2 = 998\).

Hypothesis Testing - Theoretical Method (2/2)

We multiply the p-value by 2 since it’s a two sided test, but it’s still 0.
Since the p-value is less than \(\alpha = 0.01\), then we reject the null hypothesis, meaning we have enough evidence to support that the true slope (relationship) between weeks and weight is non-zero.
More specifically, since the test statistic is positive, then we can further conclude that the relationship is strongly positive.

Using R to compute the p-value

df <- 998 # degrees of freedom
t <- 20.69 # test statistic
2*(1-pt(t,998))

## [1] 0

Confidence Interval (1/2)

99% Confidence interval for the slope

\[b_1 \pm t_{df}^* \text{SE}_{b_1}\]

Critical t-star for a 0.99 confidence level: \(t_{df}^* = t_{998}^* = 2.5808\). Note that we use \(1-\alpha\) as the confidence level, where \(alpha = 0.01\).

Using R to compute the critical t-star

df <- 998 # degrees of freedom
lt <- (1-0.99)/2 # lower tail probability
qt(lt,df)

## [1] -2.580765

\[ \begin{aligned} 0.2792 & \pm 2.5808 \times 0.0135 \end{aligned} \] \[(0.2444,0.3140)\]

We are 99% confident that the true slope is in between 0.2444 and 0.3140. Note that the null value of 0 (no slope) is not within the interval.