MTH-361A | Spring 2025 | University of Portland
April 11, 2025
Linear Regression
\[ y = \beta_0 + \beta_1 x + \epsilon \]
Consider data births gathered originally from the US Department of
Health and Human Services. The births14
data can be found in the openintro
R package.
We want to predict the baby weight based on number of weeks. The population linear model is \[y_{weight} = \beta_0 + \beta_1 x_{weeks} + e\]
The relevant hypotheses for the linear model setting can be written in terms of the population slope parameter.
Here the population refers to a larger population of births in the US.
weight
and
weeks
.weight
and
weeks
.Let’s set the significance value to be \(\alpha = 0.01\).
term | estimate | std.error |
---|---|---|
(Intercept) | -3.5980 | 0.5227 |
weeks | 0.2792 | 0.0135 |
The least squares regression model uses the data to find a sample linear fit: \[\hat{y}_{weight} = -3.5980 + 0.2792 \times x_{weeks}.\]
R code:
where the data is stored as a data frame named births14
and weight
and weeks
are two numerical
variables in the data frame.
Two different permutations of the weight
variable with
slightly different least squares regression lines.
Histogram of slopes given different permutations of the
weight
variable. The vertical red line is at the observed
value of the slope, 0.28.
term | estimate | std.error | statistic | p.value |
---|---|---|---|---|
(Intercept) | -3.5980 | 0.5227 | -6.8831 | 0 |
weeks | 0.2792 | 0.0135 | 20.6988 | 0 |
Using R to compute the p-value
## [1] 0
99% Confidence interval for the slope
\[b_1 \pm t_{df}^* \text{SE}_{b_1}\]
Using R to compute the critical t-star
## [1] -2.580765
\[ \begin{aligned} 0.2792 & \pm 2.5808 \times 0.0135 \end{aligned} \] \[(0.2444,0.3140)\]
We are 99% confident that the true slope is in between 0.2444 and 0.3140. Note that the null value of 0 (no slope) is not within the interval.
.Rmd
file by replacing [name]
with your name
using the format [First name][Last initial]
. Then, open the
.Rmd
file.