MTH-361A | Spring 2025 | University of Portland
April 2, 2025
The Linear Model
A linear model is written as
\[ y = \beta_0 + \beta_1 x + \epsilon \]
where \(y\) is the outcome, \(x\) is the predictor, \(\beta_0\) is the intercept, and \(\beta_1\) is the slope. The notation \(\epsilon\) is the model’s error.
Notation:
We can use the sample statistics \(b_0\) and \(b_1\) as point estimates to infer the true value of the population parameters \(\beta_0\) and \(\beta_1\).
Sample data with their best fitting lines (top row) and their corresponding residual plots (bottom row).
Terms:
The Sum of Squared Error (SSE) is a metric of left-over variability in the \(y\) values if we know \(x\).
\[ SSE = \sum_{i=1}^n (e_i)^2 \]
The Total Sum of Squares (SST) is a metric measure the variability in the \(y\) values by how far they tend to fall from their mean, \(\bar{y}\).
\[ SST = \sum_{i=1}^n (y_i - \bar{y})^2 \]
where \(\bar{y} = \frac{1}{n} \sum_{i-1}^n y_i\), and \(n\) is the number of observations.
To find the best linear fit, we minimize the SSE.
\[ SSE = \sum_{i=1}^n (e_i)^2 = \sum_{i=1}^n (y_i - \hat{y_i})^2 \]
Plugging-in the linear equation \(\hat{y_i} = b_0 + b_1 x\), we have
\[ SSE = \sum_{i=1}^n (y_i - (b_0 + b_1 x_i)^2. \]
Minimizing the above equation over all possible values of \(b_0\) and \(b_1\) is a calculus problem. Take the derivative of SSE with respect to \(b_1\), set it equal to zero, and solve for \(b_1\).
Long story short,
\[ \begin{aligned} b_1 = & \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} \\ b_0 = & \bar{y} - b_1 \bar{x} \end{aligned} \]
where \(\bar{x}\) and \(\bar{y}\) are the mean of \(x\) and \(y\) respectively.
We can rewrite the slope as follows
\[ \begin{aligned} b_1 & = \frac{\sum_{i=1}^n (x_i - \bar{x})(y_i - \bar{y})}{\sum_{i=1}^n (x_i - \bar{x})^2} \\ & = \frac{\sqrt{\frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y})^2}}{\sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2}} \frac{\sum_{i=1}^n{(x_i - \bar{x})(y_i - \bar{y})}}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2 \sum_{i=1}^n (y_i - \bar{y})^2}} \\ b_1 & = \frac{s_y}{s_x} r \end{aligned} \]
where \(s_y\) and \(s_x\) are the standard deviations of \(x\) and \(y\) respectively, and \(r\) is the pearson correlation coefficient of \(x\) and \(y\).
Least-Squares Example Visualization: Shown here is some data (orange dots) and the best fit linear model (red line) y = 5.37 + 0.62*x . You can try this least-squares regression interactive demo to visualize on how it works.
To find the best fit linear model to data, we compute the slope and intercept by using the correlation and standard deviations.
\[ \begin{aligned} \text{mean of x} \longrightarrow & \bar{x} = \frac{1}{n} \sum_{i=1}^n x_i \\ \text{mean of y} \longrightarrow & \bar{y} = \frac{1}{n} \sum_{i=1}^n y_i \\ \text{standard deviation of x} \longrightarrow & s_x = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (x_i - \bar{x})^2} \\ \text{standard deviation of y} \longrightarrow & s_y = \sqrt{\frac{1}{n-1} \sum_{i=1}^n (y_i - \bar{y})^2} \\ \text{correlation of x and y} \longrightarrow & r = \frac{\sum_{i=1}^n{(x_i - \bar{x})(y_i - \bar{y})}}{\sqrt{\sum_{i=1}^n (x_i - \bar{x})^2 \sum_{i=1}^n (y_i - \bar{y})^2}} \\ \text{best fit slope} \longrightarrow & b_1 = \frac{s_y}{s_x} r \\ \text{best fit intercept} \longrightarrow & b_0 = \bar{y} - b_1 \bar{x} \end{aligned} \]
Note that the \(n-1\) term - known as Bessel’s correction - in the sample variance \(s^2\) is the unbiased estimator for the population variance \(\sigma^2\).
\[ b_1 = \frac{s_y}{s_x} r \]
The coefficient of determination can then be calculated as
\[ R^2 = \frac{SST - SSE}{SST} = 1 - \frac{SSE}{SST} \]
where
\[ SSE = \sum_{i=1}^n (e_i)^2 \hspace{10px} \text{ and } \hspace{10px} SST = \sum_{i=1}^n (y_i - \bar{y})^2. \]
The range of \(R^2\) is from 0 to 1. \(R^2\) is the a measure of how well the linear regression fits the data.
Interpretation:
In the case for a linear model with one predictor and one outcome, the relationship between the correlation and the coefficient of determination is \(R^2 = r^2\).
Three plots, each with a least squares line and corresponding residual plot. Each dataset has at least one outlier.
Types of outliers.
We must be cautious on removing outliers in our modeling. Sometimes outliers are interesting cases that might be worth investigating and it might even make amodel much better.
Try out this least-squares regression interactive demo to play around with outliers in least squares regression.
.pdf
file.