MTH-361A | Spring 2025 | University of Portland
March 21, 2025
Confidence Interval for One Proportion
\[\hat{p} \pm z^{\star} \text{SE}_{\hat{p}}\]
\[ \begin{aligned} \hat{p} & \longrightarrow \text{sample proportion (or the point estimate)} \\ z^{\star} & \longrightarrow \text{critical z-score at a given confidence level} \\ \text{SE}_{\hat{p}} & \longrightarrow \text{standard error of the sampling distribution} \\ \end{aligned} \]
Hypothesis Testing for One Proportion
\[ \begin{aligned} p & \longrightarrow \text{population proportion} \\ \hat{p} & \longrightarrow \text{sample proportion (or the point estimate)} \\ H_0: p = p_0 & \longrightarrow \text{null hypothesis} \\ H_A: p \ne p_0 & \longrightarrow \text{alternative hypothesis (can be } < \text{ or } > \text{)} \\ z & \longrightarrow \text{test statistic} \\ \text{SE}_{p} & \longrightarrow \text{standard error of the null distribution} \\ \end{aligned} \]
The Chi-Square statistic is used in hypothesis testing to determine whether observed data differs significantly from expected data. Commonly used for categorical data analysis.
\[\chi^2 = \sum \frac{(O-E)^2}{E}\]
Consider the following problem description.
Students in grades 4-6 were asked whether good grades, athletic ability, or popularity was most important to them. A two-way table separating the students by grade and by choice of most important factor is shown below. Do these data provide evidence to suggest that goals vary by grade?
Grade | Popular | Sports | |
---|---|---|---|
4th | 63 | 31 | 23 |
5th | 88 | 55 | 33 |
6th | 96 | 55 | 32 |
Source: Popular Kids Dataset. This is from a 1992 study and was revisited 30 years later.
Grade | Popular | Sports | Total | |
---|---|---|---|---|
4th | \(\color{blue}{63}\) | \(\color{orange}{31}\) | 23 | 119 |
5th | 88 | 55 | 33 | 176 |
6th | 96 | 55 | \(\color{red}{32}\) | 183 |
Total | 247 | 141 | 90 | 478 |
Note: Color corresponds to the cell and we are rounding to the nearest integer for computing the expected frequencies.
\[\color{blue}{E_{4th,Grade} = \frac{(119)(247)}{478} = 61}\] \[\color{orange}{E_{4th,Popular} = \frac{(119)(141)}{478} = 35}\] \[\vdots\] \[\color{red}{E_{6th,Sports} = \frac{(183)(90)}{478} = 34}\]
Grade | Popular | Sports | Total | |
---|---|---|---|---|
4th | 63 | 31 | 23 | 119 |
5th | 88 | \(\color{green}{55}\) | 33 | 176 |
6th | 96 | 55 | 32 | 183 |
Total | 247 | 141 | 90 | 478 |
Grade | Popular | Sports | Total | |
---|---|---|---|---|
4th | 63 \(\color{blue}{[61]}\) | 31 \(\color{blue}{[35]}\) | 23 \(\color{blue}{[23]}\) | 119 |
5th | 88 \(\color{blue}{[91]}\) | 55 \(\color{blue}{[52]}\) | 33 \(\color{blue}{[33]}\) | 176 |
6th | 96 \(\color{blue}{[95]}\) | 55 \(\color{blue}{[54]}\) | 32 \(\color{blue}{[34]}\) | 183 |
Total | 247 | 141 | 90 | 478 |
The \(\chi^2\) statistic. \[\chi^2_{k} = \frac{(63-61)^2}{61} + \frac{(31-35)^2}{35} + \cdots + \frac{(32-34)^2}{34} = 0.967\]
Degrees of freedom. \[k = (3-1) \times (3-1) = 2(2) = 4\]
\(\chi^2_{k} = 1.3121\) and \(k = 4\)
We can use the pchisq
function in R.
## [1] 0.9147579
The p-value of 0.9148.
Do these data provide evidence to suggest that goals vary by grade? \[H_0: \text{Grade and goals are independent. Goals do not vary by grade.}\] \[H_A: \text{Grade and goals are dependent. Goals vary by grade}\]
Since the p-value is large, we fail to reject \(H_0\). The data do not provide convincing evidence that grade and goals are dependent. It doesn’t appear that goals vary by grade.
The problem shown below was taken and slightly modified from your textbook OpenIntro: Introduction to Modern Statistics Section 18.4. Consider the research study described below.
Coffee and Depression.
Researchers conducted a study investigating the relationship between caffeinated coffee consumption and risk of depression in women. They collected data on 50,739 women free of depression symptoms at the start of the study in the year 1996, and these women were followed through 2006. The researchers used questionnaires to collect data on caffeinated coffee consumption, asked each individual about physician- diagnosed depression, and also asked about the use of antidepressants. The table below shows the distribution of incidences of depression by amount of caffeinated coffee consumption. Lucas et al. 2011
Caffeinated coffee consumption
|
||||||
---|---|---|---|---|---|---|
Clinical depression | 1 cup / week or fewer | 2-6 cups / week | 1 cups / day | 2-3 cups / day | 4 cups / day or more | Total |
Yes | 670 | 373 | 905 | 564 | 95 | 2,607 |
No | 11,545 | 6,244 | 16,329 | 11,726 | 2,288 | 48,132 |
Total | 12,215 | 6,617 | 17,234 | 12,290 | 2,383 | 50,739 |
Compute the test statistic. What is the p-value?
What is the conclusion of the hypothesis test?
\(\chi^2 =20.932\) and degrees of freedom is \(k =4\)
p-value:
## [1] 0.0003266524
.pdf
file.