Inference for Two Proportions

Applied Statistics

MTH-361A | Spring 2026 | University of Portland

Objectives

Pesticide Effects

Researchers want to know whether exposure to a neonicotinoid pesticide reduces honeybee survival rates.

They take two groups of bees raised under identical conditions:

Research questions:

\(\star\) These two questions talks about the same goal of the experiment but with different approaches. The 1st question relates to confidence intervals, while the second question relates to hypothesis testing.

Parameter and Point Estimate

We would like to estimate the difference of proportions between the control group and treatment group.

What are the parameter of interest and the point estimate?

Parameter of interest:

Point estimate:

Inference of Two Proportions

What is the true difference in proportions between the treatment and control groups?

Confidence Interval:

Sampling distribution:

CLT for Two Proportions

We can use the normal approximation of the Binomial to simplify the sampling distribution of the difference of two sample proportions.

CLT Conditions:

\(\star\) Having \(10\) as the minimum number of “success” and “failure” is a rule of thumb, but if more samples can be obtained, the better.

Normal approximation:

Standard error:

Inferring the True Difference of Two Proportions

The results of the experiment yeilded a difference of sample proportions \(\hat{p}_B - \hat{p}_A = \frac{172}{200} - \frac{140}{200} = 0.86 - 0.70 = 0.16\).

Information given:

Confidence interval:

Using R:

z_star <- qnorm(0.90+((1-0.90)/2),0,1) # critical value
n_A <- 200 # group A sample size
n_B <- 200 # group B sample size
p_hat_A <- 140/n_A # group A sample proportion
p_hat_B <- 172/n_B # group B sample proportion
p_diff <- p_hat_B-p_hat_A # difference in sample proportions (point estimate)
SE_diff <- sqrt(((p_hat_A*(1-p_hat_A))/n_A) + ((p_hat_B*(1-p_hat_B))/n_B)) # standard error
cl_lb <- p_diff - z_star*SE_diff # upper bound
cl_ub <- p_diff + z_star*SE_diff # lower bound
c(cl_lb,cl_ub) # interval as an ordered list
## [1] 0.09314525 0.22685475

Interpretation of the Confidence Interval

The point estimate for the true difference in proportions of the treatment and control groups is \(\hat{p}_B - \hat{p}_A = 0.86 - 0.70 = 0.16\) with standard error \(SE_{\hat{p}_B - \hat{p}_A} \approx 0.041\). For a \(0.90\) confidence level, \(z^* \approx 1.645\).

\(\star\) This interval answers “Is there a real difference of proportions between the groups of is it just by chance?” because it estimates the difference with some level of uncertainty.

Confidence interval:

Interpretation:

Summary of Parameter Estimation for Two Proportions

Let there be two independent groups \(A\) and \(B\).

CLT conditions:

Sampling distribution of the point estimate:

Confidence interval:

\[\hat{p}_{diff} \pm z^* \cdot SE_{\hat{p}_{diff}}\]

\(\dagger\) Use the qnorm function in R to compute \(z^*\).

Bee Survival

Now, we explore the answer to the research question “Is the lower survival rate in the treatment group a real biological effect of the pesticide, or just due to sampling variability?”

Data:

Objective:

Define Hypothesis

Let \(p_B - p_A\) represent the true difference in proportions between the treatment and control groups.

Null hypothesis \(H_0\): There is no difference in proportions between the treatment and control groups (there is no effect of the pesticide to the survival of the honeybees).

\[p_B - p_A = 0\]

Significance level: A significance level of \(\alpha = 0.10\) is chosen.

\(\dagger\) The significance level \(\alpha=0.10\) is consistent with our earlier analysis with confidence level \(0.90\) because confidence level is \(1-\alpha\).

Alternative hypothesis \(H_A\): There is a difference in proportions between treatment and control groups (there is an effect of the pesticide to the survival of the honeybees).

\[p_B - p_A > 0\]

\(\star\) This is a one-tailed test because the \(H_A\) is using the \(>\) sign.

Compute the Test Statistic

The point estimate is the difference in sample proportions \(\hat{p}_B - \hat{p}_A = \frac{172}{200} - \frac{140}{200} = 0.16\).

Test statistic for two proportions:

\[z = \frac{(\hat{p}_B - \hat{p}_A) - 0}{SE_{\hat{p}_B - \hat{p}_A}}\]

Computing the test statistic:

\[ \begin{aligned} \hat{p}_{pool} & = \frac{n_A \hat{p}_A + n_B \hat{p}_B}{n_A + n_B} \\ & = \frac{140 + 172}{200 + 200} \\ \hat{p}_{pool} & = 0.78 \end{aligned} \]

\[ \begin{aligned} z & = \frac{0.16}{\sqrt{0.78\left( 1-0.78 \right)\left( \frac{1}{200} + \frac{1}{200} \right)}} \\ z & \approx 3.862 \end{aligned} \]

\(\star\) The pooled standard error in two-proportion inference provides a more precise, singular estimate of the population proportion under the null hypothesis. It forces consistency with the assumption that the two populations are identical, creating a more stable and accurate test statistic

Determine the P-value

Determine the probability associated with the computed test statistic. Remember that this is the probability \(P(Z \ge z|H_0)\), where \(Z\) is an r.v. with the standard normal distribution.

Sampling distribution of the null value (normalized):

Using R:

n_A <- 200 # group A sample size
n_B <- 200 # group B sample size
p_A <- 140/n_A # group A sample proportion
p_B <- 172/n_B # group B sample proportion
p_pool <- (140+172)/(n_A+n_B) # pooled proportion
p_diff <- p_B - p_A # sample difference (point estimate)
p_0 <- 0 # null value
SE_pool <- sqrt(p_pool*(1-p_pool)*(1/n_A + 1/n_B)) # pooled standard error
z <- (p_diff-p_0)/SE_pool # test statistic

# p-value
1-pnorm(z,0,1) 
## [1] 5.61309e-05

\(\star\) The p-value is the probability \(P(Z \ge z|H_0) \approx 0.000056\) (practically \(0\)). Since this is one-tailed test, we only use the right tail probability.

Make a Decision and Conclusion

We compare the p-value to our chosen significance level of \(\alpha = 0.10\).

Choices:

Conclusions:

Interpretation of the Hypothesis Test

The hypothesis test concluded that we reject \(H_0\).

Context:

Interpretation:

\(\star\) Note that this is an experiment, albeit a very simple one. So, we can conclude a causation, where pesticides can cause lower survival rates of the honeybees.

Confidence Interval in Relation to Hypothesis Testing

Earlier, We computed a \(90\)% confidence interval of the difference of sample proportions (point estimate) \(\hat{p}_B - \hat{p}_A = \frac{172}{200} - \frac{140}{200} = 0.16\)..

Confidence Level:

Confidence Interval:

\(\star\) The null value of \(0\) is not within the \(90\)% confidence interval. We would reject the null hypothesis at the \(10\)% significance level.

Summary of Hypothesis Testing for Two Proportions (1/2)

Let \(p_A\) and \(p_B\) be the population parameters for groups \(A\) and \(B\) respectively and \(p_0\) (difference of two proportions) the null value.

State the Hypotheses:

\(\dagger\) The alternative hypothesis can be \(\ne\) (two-sided) and \(<\) or \(>\) (one-sided) depending on context. \(\dagger\) Usually the null value is \(p_0 = 0\) for the null hypothesis of “no difference” or “no effect”.

Set Significance Value \(\alpha\):

\(\star\) The significance value has to be set before looking at the p-value.

Summary of Hypothesis Testing for Two Proportions (2/2)

Compute the test statistic:

\[z = \frac{\left(\hat{p}_B-\hat{p}_A\right)-p_0}{SE_{p_B - p_A}}\]

Determine the p-value:

\(\dagger\) Use the pnorm function in R to compute the p-value.

Sampling distribution of the null value (left one-tail):

Sampling distribution of the null value (right one-tail):

Sampling distribution of the null value (two-tail):

Summary of Hypothesis Testing for One Proportion (3/3)

Make a decision and conclusion:

Important Notes:

\(\star\) If you rejected \(H_0\), it does not mean that \(H_0\) is immediately false. It means that the observation is a rare occurrence under the assumption that it came from the null value’s sampling distribution.

\(\star\) If you failed to reject \(H_0\), it does not mean that the \(H_0\) is “accepted”. It means that the observation just happened by chance due to sampling variability.