MTH-361A | Spring 2026 | University of Portland
The contingency table below shows the distribution of survival and different classes of passengers on the Titanic.
| 1st | 2nd | 3rd | crew | Sum | |
|---|---|---|---|---|---|
| no | 123 | 166 | 528 | 679 | 1496 |
| yes | 201 | 118 | 181 | 211 | 711 |
| Sum | 324 | 284 | 709 | 890 | 2207 |
Goals:
The Chi-Square statistic is used in hypothesis testing to determine whether observed data differs significantly from expected data. Commonly used for categorical data analysis.
\[\chi^2 = \sum \frac{(O-E)^2}{E}\]
The degrees of freedom (df) in a chi-square test depend on the type of test being conducted:
1st Class Distribution
\[ \begin{aligned} H_0: & \text{The observed distribution of survived among 1st class passengers follows the expectation.} \\ H_A: & \text{The observed distribution of survived among 1st class passengers differs from the expectation.} \end{aligned} \]
| 1st | |
|---|---|
| no | 123 |
| yes | 201 |
\[ \begin{aligned} \chi^2 & = \frac{(123 - 162)^2}{162} + \frac{(201 - 162)^2}{162} \\ & = 18.78 \end{aligned} \]
Determine degrees of freedom, which is \(df = 1\).
Compute the p-value using R.
df <- 1 # define degrees of freedom
chisq <- 18.78 # set chi-square statistic
1-pchisq(18.78,df) # compute the p-value## [1] 1.466974e-05
2nd Class Distribution
\[ \begin{aligned} H_0: & \text{The observed distribution of survived among 2nd class passengers follows the expectation.} \\ H_A: & \text{The observed distribution of survived among 2nd class passengers differs from the expectation.} \end{aligned} \]
| 2nd | |
|---|---|
| no | 166 |
| yes | 118 |
\(\dagger\) Determine the \(\chi^2\) test statistic and the p-value. What is your conclusion?
| 1st | 2nd | 3rd | crew | Sum | |
|---|---|---|---|---|---|
| no | 123 | 166 | 528 | 679 | 1496 |
| yes | 201 | 118 | 181 | 211 | 711 |
| Sum | 324 | 284 | 709 | 890 | 2207 |
\[ \begin{aligned} H_0: & \text{There is no association between class and survived. The variables are independent.} \\ H_A: & \text{There is an association between class and survived. The variables are dependent.} \end{aligned} \]
Consider the following problem description.
Students in grades 4-6 were asked whether good grades, athletic ability, or popularity was most important to them. A two-way table separating the students by grade and by choice of most important factor is shown below. Do these data provide evidence to suggest that goals vary by grade?
| Grade | Popular | Sports | |
|---|---|---|---|
| 4th | 63 | 31 | 23 |
| 5th | 88 | 55 | 33 |
| 6th | 96 | 55 | 32 |
Source: Popular Kids Dataset. This is from a 1992 study and was revisited 30 years later.
| Grade | Popular | Sports | Total | |
|---|---|---|---|---|
| 4th | \(\color{blue}{63}\) | \(\color{orange}{31}\) | 23 | 119 |
| 5th | 88 | 55 | 33 | 176 |
| 6th | 96 | 55 | \(\color{red}{32}\) | 183 |
| Total | 247 | 141 | 90 | 478 |
Note: Color corresponds to the cell and we are rounding to the nearest integer for computing the expected frequencies.
\[\color{blue}{E_{4th,Grade} = \frac{(119)(247)}{478} = 61}\] \[\color{orange}{E_{4th,Popular} = \frac{(119)(141)}{478} = 35}\] \[\vdots\] \[\color{red}{E_{6th,Sports} = \frac{(183)(90)}{478} = 34}\]
| Grade | Popular | Sports | Total | |
|---|---|---|---|---|
| 4th | 63 | 31 | 23 | 119 |
| 5th | 88 | \(\color{green}{55}\) | 33 | 176 |
| 6th | 96 | 55 | 32 | 183 |
| Total | 247 | 141 | 90 | 478 |
| Grade | Popular | Sports | Total | |
|---|---|---|---|---|
| 4th | 63 \(\color{blue}{[61]}\) | 31 \(\color{blue}{[35]}\) | 23 \(\color{blue}{[23]}\) | 119 |
| 5th | 88 \(\color{blue}{[91]}\) | 55 \(\color{blue}{[52]}\) | 33 \(\color{blue}{[33]}\) | 176 |
| 6th | 96 \(\color{blue}{[95]}\) | 55 \(\color{blue}{[54]}\) | 32 \(\color{blue}{[34]}\) | 183 |
| Total | 247 | 141 | 90 | 478 |
The \(\chi^2\) statistic. \[\chi^2_{k} = \frac{(63-61)^2}{61} + \frac{(31-35)^2}{35} + \cdots + \frac{(32-34)^2}{34} = 0.967\]
Degrees of freedom. \[k = (3-1) \times (3-1) = 2(2) = 4\]
\(\chi^2_{k} = 1.3121\) and \(k = 4\)
We can use the pchisq function in R.
## [1] 0.9147579
The p-value of 0.9148.
Do these data provide evidence to suggest that goals vary by grade? \[H_0: \text{Grade and goals are independent. Goals do not vary by grade.}\] \[H_A: \text{Grade and goals are dependent. Goals vary by grade}\]
Since the p-value is large, we fail to reject \(H_0\). The data do not provide convincing evidence that grade and goals are dependent. It doesn’t appear that goals vary by grade.
\(\star\) Key Idea: The chi-square test assumes independent categorical data with sufficiently large expected counts and compares observed vs. expected frequencies to assess whether deviations are due to chance.