MTH-361A | Spring 2026 | University of Portland
A plausible range of values for the population parameter is called a confidence interval.
Fishing analogy:
A confidence interval is like using a fishing net, rather than a spear, to catch fish in a murky lake.
The interval:
We need to set the size of the fishing net first before casting it into the lake.
\(\star\) If we report a point estimate, we probably won’t hit the exact population parameter. If we report a range of plausible values we have a good shot at capturing the parameter.
Suppose we want to estimate the number of spam emails of an account.
Data summary:
| Not Spam | Spam |
|---|---|
| 184 | 16 |
Point estimate:
\(\star\) For the purpose of this example, the sample mean notation \(\overline{x}\) can be thought of as the mean of a binomial r.v..
The point estimate for the number of spam emails is \(\overline{x} = 16\) and the sample size is \(n = 200\).
Assumptions:
Sampling distribution of \(\overline{x}\):
\(\star\) The domain of this distribution is \(0 \le x \le 200\) (the graph is truncated) because there are \(n = 200\) samples and the \(x\) value is the number of spam emails out of \(n\), where it could be \(0\), \(200\), or in between.
The confidence level is a probability that we set. This is how wide our interval is going to be.
Confidence Level:
Suppose we want a “\(90\)% confidence interval”. So, the confidence level is \(0.90\).
This is the interval probability of the sampling distribution.
The goal is to find \(a\) and \(b\), so that \[P(a \le \overline{X} \le b) \approx 0.90,\] where \(X\) is a binomial r.v..
The middle probability is \(0.90\) and the tail probabilities sums to \(0.10\), with individual tails to be \(0.05\), assuming symmetry.
Probability components:
Confidence Interval:
\[10 \le \overline{x} \le 23\]
\(\star\) Interpreting this interval requires you to understand how CLT works and a basic understanding of the frequentist interpretation of probability.
Using R:
cl <- 0.90 # confidence level
cl_tail <- 1-cl # tail probabilities
lb <- qbinom(cl_tail/2,200,16/200) # lower bound
ub <- qbinom(cl+(cl_tail/2),200,16/200) # upper bound
c(lb,ub) # confidence interval as an ordered list## [1] 10 23
We have \(16\) spam emails and \(184\) not spam emails, both are more than \(10\), and the sample size \(n = 200\) is considered large enough. This rule of thumb says that we can approximate the binomial using the normal distribution.
Normal approximation:
Confidence interval assuming normality:
\[ \overline{x} \pm z^* \cdot SE \] where \(SE\) is called the standard error. Here, \(\displaystyle SE = \sqrt{n\hat{p}(1-\hat{p})}\). The term \(z^*\) is called the critical value, which can be computed using R.
So, the interval is
\[ \begin{aligned} \overline{x} & \pm z^* \cdot \sqrt{n\hat{p}(1-\hat{p})} \\ 16 & \pm 1.645\left(\sqrt{200(0.08)(1-0.08)})\right) \end{aligned} \]
\[9.689 \le \overline{x} \le 22.311\]
Using R:
cl <- 0.90 # confidence level
cl_tail <- 1-cl # tail probabilities
lb <- qnorm(cl_tail/2,16,sqrt(200*0.08*(1-0.08))) # lower bound
ub <- qnorm(cl+(cl_tail/2),16,sqrt(200*0.08*(1-0.08))) # upper bound
c(lb,ub) # confidence interval as an ordered list## [1] 9.689247 22.310753
Most commercial websites (e.g. social media platforms, news out- lets, online retailers) collect a data about their users’ behaviors and use these data to deliver targeted content, recommendations, and ads.
To understand whether Americans think their lives line up with how the algorithm-driven classification systems categorizes them, Pew Research asked a representative sample of 850 American Facebook users how accurately they feel the list of categories Facebook has listed for them on the page of their supposed interests actually represents them and their interests. 67% of the respondents said that the listed categories were accurate.
Estimate the true proportion of American Facebook users who think the Facebook categorizes their interests accurately.
The goal of parameter estimation is to find a range of possible values (confidence interval).
Given information
The Confidence Interval
We want to find the 95% confidence interval using the formula: \[\text{point estimate} \pm 1.96 \times \text{SE}\] where SE is the standard error.
This can be written as \[ \begin{aligned} 0.67 & \pm 1.96 \times \sqrt{\frac{0.67 (1-0.67)}{850}} \\ 0.67 & \pm 1.96 \times 0.0161 \\ & \longrightarrow (0.67-0.0316,0.67+0.0316) \\ & \longrightarrow (0.6384,0.7016) \end{aligned} \]
Thus, the 95% interval for estimating the true \(p\) is between 0.6384 and 0.7016.
Which of the following is the correct interpretation of this confidence interval? We are 95% confident that:
Suppose we took many samples and built a confidence interval from each sample using the equation \[\text{point estimate} \pm 1.96 \times \text{standard error}.\]
Then about 95% of those intervals would contain the true population proportion (\(p\)).
If we want to be more certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval?
\(\star\) A wider interval.
Can you see any drawbacks to using a wider interval?
\(\star\) If the interval is too wide it may not be very informative.
\[\text{point estimate} \pm z^{\star} \times \text{SE}.\]
Find the \(z^{\star}\) for a 92% confidence level.
Process:
qnorm() function in R.Using R:
cl <- 0.92 # confidence level
lt <- (1-cl)/2 # lower tail probability
qnorm(lt,0,1) # computes the z star## [1] -1.750686