MTH-361A | Spring 2026 | University of Portland
Researchers want to know whether exposure to a neonicotinoid pesticide reduces honeybee survival rates.
They take two groups of bees raised under identical conditions:
Research questions:
\(\star\) These two questions talks about the same goal of the experiment but with different approaches. The 1st question relates to confidence intervals, while the second question relates to hypothesis testing.
We would like to estimate the difference of proportions between the control group and treatment group.
What are the parameter of interest and the point estimate?
Parameter of interest:
Point estimate:
What is the true difference in proportions between the treatment and control groups?
Confidence Interval:
Sampling distribution:
We can use the normal approximation of the Binomial to simplify the sampling distribution of the difference of two sample proportions.
CLT Conditions:
\(\star\) Having \(10\) as the minimum number of “success” and “failure” is a rule of thumb, but if more samples can be obtained, the better.
Normal approximation:
Standard error:
The results of the experiment yeilded a difference of sample proportions \(\hat{p}_B - \hat{p}_A = \frac{172}{200} - \frac{140}{200} = 0.86 - 0.70 = 0.16\).
Information given:
Estimate using a \(90\)% confidence interval. This is a confidence level of \(0.90\).
Given: \(n_A = 200\), \(\hat{p}_A = \frac{172}{200} = 0.86\), and \(n_B = 200\), \(\hat{p}_B = \frac{140}{200} = 0.70\). First check conditions:
The CLT conditions hold. So, we can use a normal approximation of the sampling distribution of \(\hat{p}_B - \hat{p}_A\).
Confidence interval:
Using R:
z_star <- qnorm(0.90+((1-0.90)/2),0,1) # critical value
n_A <- 200 # group A sample size
n_B <- 200 # group B sample size
p_hat_A <- 140/n_A # group A sample proportion
p_hat_B <- 172/n_B # group B sample proportion
p_diff <- p_hat_B-p_hat_A # difference in sample proportions (point estimate)
SE_diff <- sqrt(((p_hat_A*(1-p_hat_A))/n_A) + ((p_hat_B*(1-p_hat_B))/n_B)) # standard error
cl_lb <- p_diff - z_star*SE_diff # upper bound
cl_ub <- p_diff + z_star*SE_diff # lower bound
c(cl_lb,cl_ub) # interval as an ordered list## [1] 0.09314525 0.22685475
The point estimate for the true difference in proportions of the treatment and control groups is \(\hat{p}_B - \hat{p}_A = 0.86 - 0.70 = 0.16\) with standard error \(SE_{\hat{p}_B - \hat{p}_A} \approx 0.041\). For a \(0.90\) confidence level, \(z^* \approx 1.645\).
\(\star\) This interval answers “Is there a real difference of proportions between the groups of is it just by chance?” because it estimates the difference with some level of uncertainty.
Confidence interval:
Interpretation:
Let there be two independent groups \(A\) and \(B\).
CLT conditions:
Sampling distribution of the point estimate:
Confidence interval:
\[\hat{p}_{diff} \pm z^* \cdot SE_{\hat{p}_{diff}}\]
\(\dagger\) Use the
qnorm function in R to compute \(z^*\).
Now, we explore the answer to the research question “Is the lower survival rate in the treatment group a real biological effect of the pesticide, or just due to sampling variability?”
Data:
Objective:
Let \(p_B - p_A\) represent the true difference in proportions between the treatment and control groups.
Null hypothesis \(H_0\): There is no difference in proportions between the treatment and control groups (there is no effect of the pesticide to the survival of the honeybees).
\[p_B - p_A = 0\]
Significance level: A significance level of \(\alpha = 0.10\) is chosen.
\(\dagger\) The significance level \(\alpha=0.10\) is consistent with our earlier analysis with confidence level \(0.90\) because confidence level is \(1-\alpha\).
Alternative hypothesis \(H_A\): There is a difference in proportions between treatment and control groups (there is an effect of the pesticide to the survival of the honeybees).
\[p_B - p_A > 0\]
\(\star\) This is a one-tailed test because the \(H_A\) is using the \(>\) sign.
The point estimate is the difference in sample proportions \(\hat{p}_B - \hat{p}_A = \frac{172}{200} - \frac{140}{200} = 0.16\).
Test statistic for two proportions:
\[z = \frac{(\hat{p}_B - \hat{p}_A) - 0}{SE_{\hat{p}_B - \hat{p}_A}}\]
Computing the test statistic:
\[ \begin{aligned} \hat{p}_{pool} & = \frac{n_A \hat{p}_A + n_B \hat{p}_B}{n_A + n_B} \\ & = \frac{140 + 172}{200 + 200} \\ \hat{p}_{pool} & = 0.78 \end{aligned} \]
\[ \begin{aligned} z & = \frac{0.16}{\sqrt{0.78\left( 1-0.78 \right)\left( \frac{1}{200} + \frac{1}{200} \right)}} \\ z & \approx 3.862 \end{aligned} \]
\(\star\) The pooled standard error in two-proportion inference provides a more precise, singular estimate of the population proportion under the null hypothesis. It forces consistency with the assumption that the two populations are identical, creating a more stable and accurate test statistic
Determine the probability associated with the computed test statistic. Remember that this is the probability \(P(Z \ge z|H_0)\), where \(Z\) is an r.v. with the standard normal distribution.
Sampling distribution of the null value (normalized):
Using R:
n_A <- 200 # group A sample size
n_B <- 200 # group B sample size
p_A <- 140/n_A # group A sample proportion
p_B <- 172/n_B # group B sample proportion
p_pool <- (140+172)/(n_A+n_B) # pooled proportion
p_diff <- p_B - p_A # sample difference (point estimate)
p_0 <- 0 # null value
SE_pool <- sqrt(p_pool*(1-p_pool)*(1/n_A + 1/n_B)) # pooled standard error
z <- (p_diff-p_0)/SE_pool # test statistic
# p-value
1-pnorm(z,0,1) ## [1] 5.61309e-05
\(\star\) The p-value is the probability \(P(Z \ge z|H_0) \approx 0.000056\) (practically \(0\)). Since this is one-tailed test, we only use the right tail probability.
We compare the p-value to our chosen significance level of \(\alpha = 0.10\).
Choices:
Conclusions:
The hypothesis test concluded that we reject \(H_0\).
Context:
Interpretation:
\(\star\) Note that this is an experiment, albeit a very simple one. So, we can conclude a causation, where pesticides can cause lower survival rates of the honeybees.
Earlier, We computed a \(90\)% confidence interval of the difference of sample proportions (point estimate) \(\hat{p}_B - \hat{p}_A = \frac{172}{200} - \frac{140}{200} = 0.16\)..
Confidence Level:
Confidence Interval:
\(\star\) The null value of \(0\) is not within the \(90\)% confidence interval. We would reject the null hypothesis at the \(10\)% significance level.
Let \(p_A\) and \(p_B\) be the population parameters for groups \(A\) and \(B\) respectively and \(p_0\) (difference of two proportions) the null value.
State the Hypotheses:
\(\dagger\) The alternative hypothesis can be \(\ne\) (two-sided) and \(<\) or \(>\) (one-sided) depending on context. \(\dagger\) Usually the null value is \(p_0 = 0\) for the null hypothesis of “no difference” or “no effect”.
Set Significance Value \(\alpha\):
\(\star\) The significance value has to be set before looking at the p-value.
Compute the test statistic:
\[z = \frac{\left(\hat{p}_B-\hat{p}_A\right)-p_0}{SE_{p_B - p_A}}\]
Determine the p-value:
If one-sided test:
If two-sided test:
Note that \(Z \sim N(0,1)\) is an r.v. with the standard normal distribution.
\(\dagger\) Use the
pnorm function in R to compute the p-value.
Sampling distribution of the null value (left one-tail):
Sampling distribution of the null value (right one-tail):
Sampling distribution of the null value (two-tail):
Make a decision and conclusion:
Important Notes:
\(\star\) If you rejected \(H_0\), it does not mean that \(H_0\) is immediately false. It means that the observation is a rare occurrence under the assumption that it came from the null value’s sampling distribution.
\(\star\) If you failed to reject \(H_0\), it does not mean that the \(H_0\) is “accepted”. It means that the observation just happened by chance due to sampling variability.