Parameter Estimation

Applied Statistics

MTH-361A | Spring 2025 | University of Portland

March 14, 2025

Objectives

Previously… (1/3)

The guiding principle of statistics is statistical thinking.

Statistical Thinking in the Data Science Life Cycle

Statistical Thinking in the Data Science Life Cycle

Previously… (2/3)

Types of Inference

Parameter Estimation Hypothesis Testing
Goal Estimate an unknown population value Assess claims about a population value
Methods Point Estimation: A single value estimate (e.g., sample mean)
Interval Estimation: A range of plausible values (e.g., confidence interval)
State a null and an alternative hypothesis
Compute a test statistic and compare it to a threshold (p-value or critical value)
Key Concept Focuses on precision in estimation (confidence intervals) Focuses on decision-making based on evidence (reject or fail to reject the null hypothesis)

Previously… (3/3)

The standard normal distribution is when \(\mu=0\) and \(s=1\) or \(Z \sim \text{N}(0,1)\).

The transformation formula (the z-score)

Standardized scores that measure how many standard deviations a value is from the mean. \[Z = \frac{X - \mu}{\sigma}\]

The standard normal distribution, \(Z \sim \text{N}(0,1)\).

Confidence Intervals

A plausible range of values for the population parameter is called a confidence interval.

Analogy

\(\star\) Key Idea: If we report a point estimate, we probably won’t hit the exact population parameter. If we report a range of plausible values we have a good shot at capturing the parameter.

Case Study I

Facebook’s categorization of user interests

Most commercial websites (e.g. social media platforms, news out- lets, online retailers) collect a data about their users’ behaviors and use these data to deliver targeted content, recommendations, and ads.

To understand whether Americans think their lives line up with how the algorithm-driven classification systems categorizes them, Pew Research asked a representative sample of 850 American Facebook users how accurately they feel the list of categories Facebook has listed for them on the page of their supposed interests actually represents them and their interests. 67% of the respondents said that the listed categories were accurate.

Estimate the true proportion of American Facebook users who think the Facebook categorizes their interests accurately.

Case Study I: Point Estimate and Standard Error

The goal of parameter estimation is to find a range of possible values (confidence interval).

Given information

The Confidence Interval

We want to find the 95% confidence interval using the formula: \[\text{point estimate} \pm 1.96 \times \text{SE}\] where SE is the standard error.

This can be written as \[ \begin{aligned} 0.67 & \pm 1.96 \times \sqrt{\frac{0.67 (1-0.67)}{850}} \\ 0.67 & \pm 1.96 \times 0.0161 \\ & \longrightarrow (0.67-0.0316,0.67+0.0316) \\ & \longrightarrow (0.6384,0.7016) \end{aligned} \]

Thus, the 95% interval for estimating the true \(p\) is between 0.6384 and 0.7016.

Case Study I: Interpretation

Which of the following is the correct interpretation of this confidence interval? We are 95% confident that:

  1. 64% to 67% of American Facebook users in this sample think Facebook categorizes their interests accurately.
  2. 64% to 67% of all American Facebook users think Facebook categorizes their interests accurately
  3. there is a 64% to 67% chance that a randomly chosen American Facebook user’s interests are categorized accurately.
  4. there is a 64% to 67% chance that 95% of American Facebook users’ interests are categorized accurately.

\(\star\) 64% to 67% of all American Facebook users think Facebook categorizes their interests accurately.

\(\dagger\) Why do we interpret the confidence interval this way?

What does 95% Confident Mean?

Suppose we took many samples and built a confidence interval from each sample using the equation \[\text{point estimate} \pm 1.96 \times \text{standard error}.\]

Then about 95% of those intervals would contain the true population proportion (\(p\)).

Width of an interval

If we want to be more certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval?

\(\star\) A wider interval.

Can you see any drawbacks to using a wider interval?

\(\star\) If the interval is too wide it may not be very informative.

Changing the Confidence Level

\[\text{point estimate} \pm z^{\star} \times \text{SE}.\]

95% Confidence Interval

Example 1

Which of the below Z scores is the appropriate \(z^{\star}\) when calculating a 99.7% confidence interval?

  1. Z = 2.05
  2. Z = 2.33
  3. Z = 2.97
  4. Z = 1.96

\(\star\) Estimating the \(z^{\star}\) can be done using the 68-95-99.7 rule. We know that \(P(-3 \le Z \le 3) \approx 0.997\). So, the closest answer is \(Z = 2.97\).

Example 1: 99.7% Confidence Interval

Activity: Determine and Interpret Confidence Intervals

  1. Make sure you have a copy of the F 3/14 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/
Speegle, Darrin and Clair, Bryan. (2021). Probability, statistics, and data: A fresh approach using r. Chapman; Hall/CRC. https://probstatsdata.com/