Chi-Squared Tests

Applied Statistics

MTH-361A | Spring 2025 | University of Portland

March 21, 2025

Objectives

Previously…

Confidence Interval for One Proportion

\[\hat{p} \pm z^{\star} \text{SE}_{\hat{p}}\]

\[ \begin{aligned} \hat{p} & \longrightarrow \text{sample proportion (or the point estimate)} \\ z^{\star} & \longrightarrow \text{critical z-score at a given confidence level} \\ \text{SE}_{\hat{p}} & \longrightarrow \text{standard error of the sampling distribution} \\ \end{aligned} \]

Hypothesis Testing for One Proportion

\[ \begin{aligned} p & \longrightarrow \text{population proportion} \\ \hat{p} & \longrightarrow \text{sample proportion (or the point estimate)} \\ H_0: p = p_0 & \longrightarrow \text{null hypothesis} \\ H_A: p \ne p_0 & \longrightarrow \text{alternative hypothesis (can be } < \text{ or } > \text{)} \\ z & \longrightarrow \text{test statistic} \\ \text{SE}_{p} & \longrightarrow \text{standard error of the null distribution} \\ \end{aligned} \]

The Chi-Squared Statistic

The Chi-Square statistic is used in hypothesis testing to determine whether observed data differs significantly from expected data. Commonly used for categorical data analysis.

\[\chi^2 = \sum \frac{(O-E)^2}{E}\]

The Chi-Squared Distribution

Example 1

Consider the following problem description.

Students in grades 4-6 were asked whether good grades, athletic ability, or popularity was most important to them. A two-way table separating the students by grade and by choice of most important factor is shown below. Do these data provide evidence to suggest that goals vary by grade?

Grade Popular Sports
4th 63 31 23
5th 88 55 33
6th 96 55 32

Source: Popular Kids Dataset. This is from a 1992 study and was revisited 30 years later.

Example 1: The Chi-Squared Test for Independence (1/2)

Example 1: The Chi-Squared Test for Independence (2/2)

Example 1: Computing the \(\chi^2\) statistic - Expected Frequency (1/3)

Grade Popular Sports Total
4th \(\color{blue}{63}\) \(\color{orange}{31}\) 23 119
5th 88 55 33 176
6th 96 55 \(\color{red}{32}\) 183
Total 247 141 90 478

Note: Color corresponds to the cell and we are rounding to the nearest integer for computing the expected frequencies.

\[\color{blue}{E_{4th,Grade} = \frac{(119)(247)}{478} = 61}\] \[\color{orange}{E_{4th,Popular} = \frac{(119)(141)}{478} = 35}\] \[\vdots\] \[\color{red}{E_{6th,Sports} = \frac{(183)(90)}{478} = 34}\]

Example 1: Computing the \(\chi^2\) statistic - Expected Frequency (2/3)

Grade Popular Sports Total
4th 63 31 23 119
5th 88 \(\color{green}{55}\) 33 176
6th 96 55 32 183
Total 247 141 90 478

Example 1: Computing the \(\chi^2\) statistic - Expected Frequency (3/3)

Grade Popular Sports Total
4th 63 \(\color{blue}{[61]}\) 31 \(\color{blue}{[35]}\) 23 \(\color{blue}{[23]}\) 119
5th 88 \(\color{blue}{[91]}\) 55 \(\color{blue}{[52]}\) 33 \(\color{blue}{[33]}\) 176
6th 96 \(\color{blue}{[95]}\) 55 \(\color{blue}{[54]}\) 32 \(\color{blue}{[34]}\) 183
Total 247 141 90 478

Example 1: Computing the \(\chi^2\) statistic

Example 1: Computing the p-value

df <- 4
1-pchisq(0.967,df)
## [1] 0.9147579
The p-value of 0.9148.

Example 1: Conclusion

Example 2

The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 18.4. Consider the research study described below.

Coffee and Depression.

Researchers conducted a study investigating the relationship between caffeinated coffee consumption and risk of depression in women. They collected data on 50,739 women free of depression symptoms at the start of the study in the year 1996, and these women were followed through 2006. The researchers used questionnaires to collect data on caffeinated coffee consumption, asked each individual about physician- diagnosed depression, and also asked about the use of antidepressants. The table below shows the distribution of incidences of depression by amount of caffeinated coffee consumption. Lucas et al. 2011

Caffeinated coffee consumption
Clinical depression 1 cup / week or fewer 2-6 cups / week 1 cups / day 2-3 cups / day 4 cups / day or more Total
Yes 670 373 905 564 95 2,607
No 11,545 6,244 16,329 11,726 2,288 48,132
Total 12,215 6,617 17,234 12,290 2,383 50,739
  1. Compute the test statistic. What is the p-value?

  2. What is the conclusion of the hypothesis test?

Example 2: Results

## [1] 0.0003266524

Activity: Test Independence for Two-Way Tables

  1. Make sure you have a copy of the F 3/21 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/
Speegle, Darrin and Clair, Bryan. (2021). Probability, statistics, and data: A fresh approach using r. Chapman; Hall/CRC. https://probstatsdata.com/