Parameter Estimation

Applied Statistics

MTH-361A | Spring 2026 | University of Portland

Objectives

Confidence Intervals

A plausible range of values for the population parameter is called a confidence interval.

Fishing analogy:

A confidence interval is like using a fishing net, rather than a spear, to catch fish in a murky lake.

The interval:

We need to set the size of the fishing net first before casting it into the lake.

\(\star\) If we report a point estimate, we probably won’t hit the exact population parameter. If we report a range of plausible values we have a good shot at capturing the parameter.

Spam Emails

Suppose we want to estimate the number of spam emails of an account.

Data summary:

Not Spam Spam
184 16

Point estimate:

\(\star\) For the purpose of this example, the sample mean notation \(\overline{x}\) can be thought of as the mean of a binomial r.v..

Sampling Distribution

The point estimate for the number of spam emails is \(\overline{x} = 16\) and the sample size is \(n = 200\).

Assumptions:

Sampling distribution of \(\overline{x}\):

\(\star\) The domain of this distribution is \(0 \le x \le 200\) (the graph is truncated) because there are \(n = 200\) samples and the \(x\) value is the number of spam emails out of \(n\), where it could be \(0\), \(200\), or in between.

Confidence Interval of the Number of Spam Emails

The confidence level is a probability that we set. This is how wide our interval is going to be.

Confidence Level:

Confidence Interval:

\[10 \le \overline{x} \le 23\]

\(\star\) Interpreting this interval requires you to understand how CLT works and a basic understanding of the frequentist interpretation of probability.

Using R:

cl <- 0.90 # confidence level
cl_tail <- 1-cl # tail probabilities
lb <- qbinom(cl_tail/2,200,16/200) # lower bound
ub <- qbinom(cl+(cl_tail/2),200,16/200) # upper bound
c(lb,ub) # confidence interval as an ordered list
## [1] 10 23

Normal Approximation

We have \(16\) spam emails and \(184\) not spam emails, both are more than \(10\), and the sample size \(n = 200\) is considered large enough. This rule of thumb says that we can approximate the binomial using the normal distribution.

Normal approximation:

Confidence interval assuming normality:

\[ \overline{x} \pm z^* \cdot SE \] where \(SE\) is called the standard error. Here, \(\displaystyle SE = \sqrt{n\hat{p}(1-\hat{p})}\). The term \(z^*\) is called the critical value, which can be computed using R.

So, the interval is

\[ \begin{aligned} \overline{x} & \pm z^* \cdot \sqrt{n\hat{p}(1-\hat{p})} \\ 16 & \pm 1.645\left(\sqrt{200(0.08)(1-0.08)})\right) \end{aligned} \]

\[9.689 \le \overline{x} \le 22.311\]

Using R:

cl <- 0.90 # confidence level
cl_tail <- 1-cl # tail probabilities
lb <- qnorm(cl_tail/2,16,sqrt(200*0.08*(1-0.08))) # lower bound
ub <- qnorm(cl+(cl_tail/2),16,sqrt(200*0.08*(1-0.08))) # upper bound
c(lb,ub) # confidence interval as an ordered list
## [1]  9.689247 22.310753

Facebook’s categorization of user interests

Most commercial websites (e.g. social media platforms, news out- lets, online retailers) collect a data about their users’ behaviors and use these data to deliver targeted content, recommendations, and ads.

To understand whether Americans think their lives line up with how the algorithm-driven classification systems categorizes them, Pew Research asked a representative sample of 850 American Facebook users how accurately they feel the list of categories Facebook has listed for them on the page of their supposed interests actually represents them and their interests. 67% of the respondents said that the listed categories were accurate.

Estimate the true proportion of American Facebook users who think the Facebook categorizes their interests accurately.

Point Estimate and Standard Error

The goal of parameter estimation is to find a range of possible values (confidence interval).

Given information

The Confidence Interval

We want to find the 95% confidence interval using the formula: \[\text{point estimate} \pm 1.96 \times \text{SE}\] where SE is the standard error.

This can be written as \[ \begin{aligned} 0.67 & \pm 1.96 \times \sqrt{\frac{0.67 (1-0.67)}{850}} \\ 0.67 & \pm 1.96 \times 0.0161 \\ & \longrightarrow (0.67-0.0316,0.67+0.0316) \\ & \longrightarrow (0.6384,0.7016) \end{aligned} \]

Thus, the 95% interval for estimating the true \(p\) is between 0.6384 and 0.7016.

Interpretation

Which of the following is the correct interpretation of this confidence interval? We are 95% confident that:

  1. 63.84% to 70.16% of American Facebook users in this sample think Facebook categorizes their interests accurately.
  2. 63.84% to 70.16% of all American Facebook users think Facebook categorizes their interests accurately
  3. There is a 63.84% to 70.16% chance that a randomly chosen American Facebook user’s interests are categorized accurately.
  4. There is a 63.84% to 70.16% chance that 95% of American Facebook users’ interests are categorized accurately.

What does 95% Confident Mean?

Suppose we took many samples and built a confidence interval from each sample using the equation \[\text{point estimate} \pm 1.96 \times \text{standard error}.\]

Then about 95% of those intervals would contain the true population proportion (\(p\)).

Width of an interval

If we want to be more certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval?

\(\star\) A wider interval.

Can you see any drawbacks to using a wider interval?

\(\star\) If the interval is too wide it may not be very informative.

Changing the Confidence Level

\[\text{point estimate} \pm z^{\star} \times \text{SE}.\]

95% Confidence Interval

99.7% Confidence Interval

Finding \(z^{\star}\) Exactly

Find the \(z^{\star}\) for a 92% confidence level.

Process:

Using R:

cl <- 0.92 # confidence level
lt <- (1-cl)/2 # lower tail probability
qnorm(lt,0,1) # computes the z star
## [1] -1.750686