MTH-161D | Spring 2025 | University of Portland
March 28, 2025
These slides are derived from Diez et al. (2012).
Confidence Interval for One Proportion
\[\hat{p} \pm z^{\star} \text{SE}_{\hat{p}}\]
\[ \begin{aligned} \hat{p} & \longrightarrow \text{sample proportion (or the point estimate)} \\ z^{\star} & \longrightarrow \text{critical z-score at a given confidence level} \\ \text{SE}_{\hat{p}} & \longrightarrow \text{standard error of the sampling distribution} \\ \end{aligned} \]
Hypothesis Testing for One Proportion
\[ \begin{aligned} p & \longrightarrow \text{population proportion} \\ \hat{p} & \longrightarrow \text{sample proportion (or the point estimate)} \\ H_0: p = p_0 & \longrightarrow \text{null hypothesis} \\ H_A: p \ne p_0 & \longrightarrow \text{alternative hypothesis (can be } < \text{ or } > \text{)} \\ z & \longrightarrow \text{test statistic} \\ \text{SE}_{p} & \longrightarrow \text{standard error of the null distribution} \\ \end{aligned} \]
Relationship Between Variables
\[\text{explanatory variable} \xrightarrow{\text{might affect}} \text{response variable}\]
Associated vs Independent Variables
When two variables show some connection with one another, they are called associated or dependent variables.
In general, association does not imply causation, and causation can only be inferred from a randomized experiment.
The contingency table below shows the distribution of survival and different classes of passengers on the Titanic.
1st | 2nd | 3rd | crew | Sum | |
---|---|---|---|---|---|
no | 123 | 166 | 528 | 679 | 1496 |
yes | 201 | 118 | 181 | 211 | 711 |
Sum | 324 | 284 | 709 | 890 | 2207 |
Goals:
The Chi-Square statistic is used in hypothesis testing to determine whether observed data differs significantly from expected data. Commonly used for categorical data analysis.
\[\chi^2 = \sum \frac{(O-E)^2}{E}\]
The degrees of freedom (df) in a chi-square test depend on the type of test being conducted:
1st Class Distribution
\[ \begin{aligned} H_0: & \text{The observed distribution of survived among 1st class passengers follows the expectation.} \\ H_A: & \text{The observed distribution of survived among 1st class passengers differs from the expectation.} \end{aligned} \]
1st | |
---|---|
no | 123 |
yes | 201 |
\[ \begin{aligned} \chi^2 & = \frac{(123 - 162)^2}{162} + \frac{(201 - 162)^2}{162} \\ & = 18.78 \end{aligned} \]
Determine degrees of freedom, which is \(df = 1\).
Compute the p-value using R.
df <- 1 # define degrees of freedom
chisq <- 18.78 # set chi-square statistic
1-pchisq(18.78,df) # compute the p-value
## [1] 1.466974e-05
2nd Class Distribution
\[ \begin{aligned} H_0: & \text{The observed distribution of survived among 2nd class passengers follows the expectation.} \\ H_A: & \text{The observed distribution of survived among 2nd class passengers differs from the expectation.} \end{aligned} \]
2nd | |
---|---|
no | 166 |
yes | 118 |
\(\dagger\) Determine the \(\chi^2\) test statistic and the p-value. What is your conclusion?
1st | 2nd | 3rd | crew | Sum | |
---|---|---|---|---|---|
no | 123 | 166 | 528 | 679 | 1496 |
yes | 201 | 118 | 181 | 211 | 711 |
Sum | 324 | 284 | 709 | 890 | 2207 |
\[ \begin{aligned} H_0: & \text{There is no association between class and survived. The variables are independent.} \\ H_A: & \text{There is an association between class and survived. The variables are dependent.} \end{aligned} \]
Consider the following problem description.
Students in grades 4-6 were asked whether good grades, athletic ability, or popularity was most important to them. A two-way table separating the students by grade and by choice of most important factor is shown below. Do these data provide evidence to suggest that goals vary by grade?
Grade | Popular | Sports | |
---|---|---|---|
4th | 63 | 31 | 23 |
5th | 88 | 55 | 33 |
6th | 96 | 55 | 32 |
Source: Popular Kids Dataset. This is from a 1992 study and was revisited 30 years later.
Grade | Popular | Sports | Total | |
---|---|---|---|---|
4th | \(\color{blue}{63}\) | \(\color{orange}{31}\) | 23 | 119 |
5th | 88 | 55 | 33 | 176 |
6th | 96 | 55 | \(\color{red}{32}\) | 183 |
Total | 247 | 141 | 90 | 478 |
Note: Color corresponds to the cell and we are rounding to the nearest integer for computing the expected frequencies.
\[\color{blue}{E_{4th,Grade} = \frac{(119)(247)}{478} = 61}\] \[\color{orange}{E_{4th,Popular} = \frac{(119)(141)}{478} = 35}\] \[\vdots\] \[\color{red}{E_{6th,Sports} = \frac{(183)(90)}{478} = 34}\]
Grade | Popular | Sports | Total | |
---|---|---|---|---|
4th | 63 | 31 | 23 | 119 |
5th | 88 | \(\color{green}{55}\) | 33 | 176 |
6th | 96 | 55 | 32 | 183 |
Total | 247 | 141 | 90 | 478 |
Grade | Popular | Sports | Total | |
---|---|---|---|---|
4th | 63 \(\color{blue}{[61]}\) | 31 \(\color{blue}{[35]}\) | 23 \(\color{blue}{[23]}\) | 119 |
5th | 88 \(\color{blue}{[91]}\) | 55 \(\color{blue}{[52]}\) | 33 \(\color{blue}{[33]}\) | 176 |
6th | 96 \(\color{blue}{[95]}\) | 55 \(\color{blue}{[54]}\) | 32 \(\color{blue}{[34]}\) | 183 |
Total | 247 | 141 | 90 | 478 |
The \(\chi^2\) statistic. \[\chi^2_{k} = \frac{(63-61)^2}{61} + \frac{(31-35)^2}{35} + \cdots + \frac{(32-34)^2}{34} = 0.967\]
Degrees of freedom. \[k = (3-1) \times (3-1) = 2(2) = 4\]
\(\chi^2_{k} = 1.3121\) and \(k = 4\)
We can use the pchisq
function in R.
## [1] 0.9147579
The p-value of 0.9148.
Do these data provide evidence to suggest that goals vary by grade? \[H_0: \text{Grade and goals are independent. Goals do not vary by grade.}\] \[H_A: \text{Grade and goals are dependent. Goals vary by grade}\]
Since the p-value is large, we fail to reject \(H_0\). The data do not provide convincing evidence that grade and goals are dependent. It doesn’t appear that goals vary by grade.
\(\star\) Key Idea: The chi-square test assumes independent categorical data with sufficiently large expected counts and compares observed vs. expected frequencies to assess whether deviations are due to chance.
.pdf
file.