MTH-361A | Spring 2026 | University of Portland
Two scientists want to know if a certain drug is effective against high blood pressure.
Survey Question:
Which is the better way to test the drug?
\(\star\) The correct answer is the “500 get the drug, 500 don’t” choice.
Results:
| Answer | Count |
|---|---|
| All 1000 get the drug | 99 |
| 500 get the drug, 500 don’t | 571 |
| Total | 670 |
We would like to estimate the proportion of all Americans who have good intuition about experimental design.
What are the parameter of interest and the point estimate?
Parameter of interest:
Point estimate:
What percent of all Americans have good intuition about experimental design, i.e. would answer “500 get the drug, 500 don’t”?
Confidence Interval:
Sampling distribution:
We can use the normal approximation of the Binomial to simplify the sampling distribution of the sample proportion.
CLT Conditions:
\(\star\) Having \(10\) as the minimum number of “success” and “failure” is a rule of thumb, but if more samples can be obtained, the better.
Normal approximation:
Standard error:
The GSS found that \(571\) out of \(670\) (\(85.2\)%) of Americans answered the question on experimental design correctly.
Information given:
Estimate using a \(95\)% confidence interval. This is a confidence level of \(0.95\).
Given: \(n = 670\), \(\hat{p} = \frac{571}{670} \approx 0.852\). First check conditions:
The CLT conditions hold. So, we can use a normal approximation of the sampling distribution of \(\hat{p}\).
Confidence interval:
Using R:
z_star <- qnorm(0.95+((1-0.95)/2),0,1) # critical value
n <- 670 # sample size
p_hat <- 571/n # sample proportion (point estimate)
SE_p <- sqrt((p_hat*(1-p_hat))/n) # standard error
cl_lb <- p_hat - z_star*SE_p # upper bound
cl_ub <- p_hat + z_star*SE_p # lower bound
c(cl_lb,cl_ub) # interval as an ordered list## [1] 0.8253686 0.8791090
The point estimate is \(\hat{p} = \frac{571}{670} \approx 0.852\) with standard error \(SE_{\hat{p}} \approx 0.014\). For a \(0.95\) confidence level, \(z^* \approx 1.960\).
Sampling distribution of the point estimate:
\(\star\) Note that we don’t actually know \(p\), but we just infered from our sample proportion \(\hat{p}\) of what it could be with some level uncertainty.
Confidence interval:
Interpretation:
Suppose we want to know how many more samples we need to reduce the margin of error (ME).
Margin of error:
How many people should we sample in order to cut the margin of error of a \(95\)% confidence interval down to \(0.01\)?
Computing the number of samples:
\[ \begin{aligned} 1.96 \cdot \sqrt{\frac{0.852(1-0.852)}{n}} & \le 0.01 \\ 1.96^2 \times \frac{0.852(1-0.852)}{n} & \le 0.01^2 \end{aligned} \]
\[ \begin{aligned} n & \ge \left(\frac{1.96}{0.01}\right)^2 \left(0.852(1-0.852)\right) \\ n & \ge 4844.104 \end{aligned} \]
\(\star\) The sample size should be \(n \ge 4845\) to have a \(0.01\) margin of error for \(95\)% confidence interval.
CLT conditions:
Sampling distribution of the point estimate:
Confidence interval:
\[\hat{p} \pm z^* \cdot SE_{\hat{p}}\]
\(\dagger\) Use the
qnorm function in R to compute \(z^*\).
A local coffee shop prides itself on high customer satisfaction. The shop’s management claims that at least \(85\)% of its customers are satisfied with their service. A market research firm is hired to assess this claim by conducting a survey.
Data:
Objective:
Let \(p\) represent the true proportion of satisfied customers.
Null Hypothesis \(H_0\): The satisfaction rate is equal to \(85\)%.
\[p = 0.85\]
Significance Level: A significance level of \(\alpha = 0.05\) is chosen.
Alternative Hypothesis \(H_A\): The satisfaction rate is greater than \(85\)%.
\[p > 0.85\]
\(\star\) This is a one-tailed test because the \(H_A\) is using the \(>\) sign.
The point estimate is the sample proportion \(\hat{p} = \frac{173}{200} = 0.865\).
Test statistic for one proportion:
\[z = \frac{\hat{p} - p_0}{SE_{p}}\]
Computing the test statistic:
\[ \begin{aligned} z & = \frac{0.865 - 0.85}{\sqrt{\frac{0.85(1-0.85)}{200}}} \\ z & \approx 0.594 \end{aligned} \]
\(\star\) The standard error formula \(SE_{p}\) uses the null value because we are assuming the null hypothesis to be true as the default.
Determine the probability associated with the computed test statistic. Remember that this is the probability \(P(Z \ge z|H_0)\), where \(Z\) is an r.v. with the standard normal distribution.
Sampling distribution of the null value:
Using R:
p_hat <- 173/200 # sample proportion (point estimate)
p_0 <- 0.85 # null value
n <- 200 # sample size
SE_p <- sqrt((p_0*(1-p_0))/(n)) # standard error
z <- (p_hat-p_0)/SE_p # test statistic
# p-value
1-pnorm(z,0,1) ## [1] 0.2762265
\(\star\) The p-value is the probability \(P(Z \ge z|H_0) = 0.276\). Since this is one-tailed test, we only use the right tail probability.
We compare the p-value to our chosen significance level of \(\alpha = 0.05\).
Choices:
Conclusions:
The hypothesis test concluded that we failed to reject \(H_0\).
Context:
Interpretation:
\(\star\) The sample proportion of \(\hat{p} = \frac{173}{200} \approx 0.865\) just happened by chance due to sampling variability.
Remember that we defined \(\alpha = 0.05\) arbitrarily before we conducted the hypothesis test.
The significance value \(\alpha\) is related to the confidence level of the confidence interval of the point estimate, which is \(1-\alpha\).
\(\star\) The significance level \(\alpha\) is the probability of rejecting the null hypothesis when it is actually true. In other words, it is the probability of making a Type I error.
We need the \(95\)% confidence interval of the sample proportion (point estimate) \(\hat{p} = \frac{173}{200} \approx 0.865\).
Confidence Level:
Confidence Interval:
\(\star\) The null value of \(0.85\) is within the \(95\)% confidence interval. We would fail to reject the null hypothesis at the \(5\)% significance level.
Let \(p\) be the population parameter and \(p_0\) be the null value.
State the Hypotheses:
\(\dagger\) The alternative hypothesis can be \(\ne\) (two-sided), and \(<\) or \(>\) (one-sided) depending on context.
Set Significance Value \(\alpha\):
\(\star\) The significance value has to be set before looking at the p-value.
Compute the test statistic:
\[z = \frac{\hat{p}-p_0}{SE_p}\]
Determine the p-value:
If one-sided test:
If two-sided test:
Note that \(Z \sim N(0,1)\) is an r.v. with the standard normal distribution.
\(\dagger\) Use the
pnorm function in R to compute the p-value.
Sampling distribution of the null value (left one-tail):
Sampling distribution of the null value (right one-tail):
Sampling distribution of the null value (two-tail):
Make a decision and conclusion:
Important Notes:
\(\star\) If you rejected \(H_0\), it does not mean that \(H_0\) is immediately false. It means that the observation is a rare occurrence under the assumption that it came from the null value’s sampling distribution.
\(\star\) If you failed to reject \(H_0\), it does not mean that the \(H_0\) is “accepted”. It means that the observation just happened by chance due to sampling variability.