Inference for One-Way and Two-Way Tables

Elementary Statistics

MTH-161D | Spring 2025 | University of Portland

March 28, 2025

Objectives

These slides are derived from Diez et al. (2012).

Previously… (1/3)

Confidence Interval for One Proportion

\[\hat{p} \pm z^{\star} \text{SE}_{\hat{p}}\]

\[ \begin{aligned} \hat{p} & \longrightarrow \text{sample proportion (or the point estimate)} \\ z^{\star} & \longrightarrow \text{critical z-score at a given confidence level} \\ \text{SE}_{\hat{p}} & \longrightarrow \text{standard error of the sampling distribution} \\ \end{aligned} \]

Previously… (2/3)

Hypothesis Testing for One Proportion

\[ \begin{aligned} p & \longrightarrow \text{population proportion} \\ \hat{p} & \longrightarrow \text{sample proportion (or the point estimate)} \\ H_0: p = p_0 & \longrightarrow \text{null hypothesis} \\ H_A: p \ne p_0 & \longrightarrow \text{alternative hypothesis (can be } < \text{ or } > \text{)} \\ z & \longrightarrow \text{test statistic} \\ \text{SE}_{p} & \longrightarrow \text{standard error of the null distribution} \\ \end{aligned} \]

Previously… (3/3)

Relationship Between Variables

\[\text{explanatory variable} \xrightarrow{\text{might affect}} \text{response variable}\]

Associated vs Independent Variables

Example 1

The contingency table below shows the distribution of survival and different classes of passengers on the Titanic.

1st 2nd 3rd crew Sum
no 123 166 528 679 1496
yes 201 118 181 211 711
Sum 324 284 709 890 2207

Goals:

  1. To analyze whether the observed frequencies of class differ significantly from the expected frequencies of survived.
  2. To test whether class and survived are independent by comparing observed and expected frequencies in a two-way table.

The Chi-Squared Test

The Chi-Square statistic is used in hypothesis testing to determine whether observed data differs significantly from expected data. Commonly used for categorical data analysis.

\[\chi^2 = \sum \frac{(O-E)^2}{E}\]

The degrees of freedom (df) in a chi-square test depend on the type of test being conducted:

Example 1

1st Class Distribution

\[ \begin{aligned} H_0: & \text{The observed distribution of survived among 1st class passengers follows the expectation.} \\ H_A: & \text{The observed distribution of survived among 1st class passengers differs from the expectation.} \end{aligned} \]

1st
no 123
yes 201

\[ \begin{aligned} \chi^2 & = \frac{(123 - 162)^2}{162} + \frac{(201 - 162)^2}{162} \\ & = 18.78 \end{aligned} \]

df <- 1 # define degrees of freedom
chisq <- 18.78 # set chi-square statistic
1-pchisq(18.78,df) # compute the p-value
## [1] 1.466974e-05

Example 1: The Chi-Square Sampling Distribution

Example 2: Inference for One-Way Table

2nd Class Distribution

\[ \begin{aligned} H_0: & \text{The observed distribution of survived among 2nd class passengers follows the expectation.} \\ H_A: & \text{The observed distribution of survived among 2nd class passengers differs from the expectation.} \end{aligned} \]

2nd
no 166
yes 118

\(\dagger\) Determine the \(\chi^2\) test statistic and the p-value. What is your conclusion?

Example 3

1st 2nd 3rd crew Sum
no 123 166 528 679 1496
yes 201 118 181 211 711
Sum 324 284 709 890 2207

\[ \begin{aligned} H_0: & \text{There is no association between class and survived. The variables are independent.} \\ H_A: & \text{There is an association between class and survived. The variables are dependent.} \end{aligned} \]

Example 3

Consider the following problem description.

Students in grades 4-6 were asked whether good grades, athletic ability, or popularity was most important to them. A two-way table separating the students by grade and by choice of most important factor is shown below. Do these data provide evidence to suggest that goals vary by grade?

Grade Popular Sports
4th 63 31 23
5th 88 55 33
6th 96 55 32

Source: Popular Kids Dataset. This is from a 1992 study and was revisited 30 years later.

Example 3: The Chi-Squared Test for Independence (1/2)

Example 3: The Chi-Squared Test for Independence (2/2)

Example 3: Computing the \(\chi^2\) statistic - Expected Frequency (1/3)

Grade Popular Sports Total
4th \(\color{blue}{63}\) \(\color{orange}{31}\) 23 119
5th 88 55 33 176
6th 96 55 \(\color{red}{32}\) 183
Total 247 141 90 478

Note: Color corresponds to the cell and we are rounding to the nearest integer for computing the expected frequencies.

\[\color{blue}{E_{4th,Grade} = \frac{(119)(247)}{478} = 61}\] \[\color{orange}{E_{4th,Popular} = \frac{(119)(141)}{478} = 35}\] \[\vdots\] \[\color{red}{E_{6th,Sports} = \frac{(183)(90)}{478} = 34}\]

Example 3: Computing the \(\chi^2\) statistic - Expected Frequency (2/3)

Grade Popular Sports Total
4th 63 31 23 119
5th 88 \(\color{green}{55}\) 33 176
6th 96 55 32 183
Total 247 141 90 478

Example 3: Computing the \(\chi^2\) statistic - Expected Frequency (3/3)

Grade Popular Sports Total
4th 63 \(\color{blue}{[61]}\) 31 \(\color{blue}{[35]}\) 23 \(\color{blue}{[23]}\) 119
5th 88 \(\color{blue}{[91]}\) 55 \(\color{blue}{[52]}\) 33 \(\color{blue}{[33]}\) 176
6th 96 \(\color{blue}{[95]}\) 55 \(\color{blue}{[54]}\) 32 \(\color{blue}{[34]}\) 183
Total 247 141 90 478

Example 3: Computing the \(\chi^2\) statistic

Example 3: Computing the p-value

df <- 4
1-pchisq(0.967,df)
## [1] 0.9147579
The p-value of 0.9148.

Example 3: Conclusion

Summary: Steps for \(\chi^2\) Tests

\(\star\) Key Idea: The chi-square test assumes independent categorical data with sufficiently large expected counts and compares observed vs. expected frequencies to assess whether deviations are due to chance.

Activity: Test for Independence

  1. Make sure you have a copy of the F 3/28 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/