Previously… (2/3)

Types of Inference

	Parameter Estimation	Hypothesis Testing
Goal	Estimate an unknown population value	Assess claims about a population value
Methods	Point Estimation: A single value estimate (e.g., sample mean) Interval Estimation: A range of plausible values (e.g., confidence interval)	State a null and an alternative hypothesis Compute a test statistic and compare it to a threshold (p-value or critical value)
Key Concept	Focuses on precision in estimation (confidence intervals)	Focuses on decision-making based on evidence (reject or fail to reject the null hypothesis)

Previously… (3/3)

The standard normal distribution is when \(\mu=0\) and \(s=1\) or \(Z \sim \text{N}(0,1)\).

The transformation formula (the z-score)

Standardized scores that measure how many standard deviations a value is from the mean. \[Z = \frac{X - \mu}{\sigma}\]

The standard normal distribution, \(Z \sim \text{N}(0,1)\).

Confidence Intervals

A plausible range of values for the population parameter is called a confidence interval.

Analogy

Using only a sample statistic to estimate a parameter is like fishing in a murky lake with a spear, and using a confidence interval is like fishing with a net.
We can throw a spear where we saw a fish but we will probably miss. If we toss a net in that area, we have a good chance of catching the fish.

\(\star\) Key Idea: If we report a point estimate, we probably won’t hit the exact population parameter. If we report a range of plausible values we have a good shot at capturing the parameter.

Case Study I

Facebook’s categorization of user interests

Most commercial websites (e.g. social media platforms, news out- lets, online retailers) collect a data about their users’ behaviors and use these data to deliver targeted content, recommendations, and ads.

To understand whether Americans think their lives line up with how the algorithm-driven classification systems categorizes them, Pew Research asked a representative sample of 850 American Facebook users how accurately they feel the list of categories Facebook has listed for them on the page of their supposed interests actually represents them and their interests. 67% of the respondents said that the listed categories were accurate.

Estimate the true proportion of American Facebook users who think the Facebook categorizes their interests accurately.

Case Study I: Point Estimate and Standard Error

The goal of parameter estimation is to find a range of possible values (confidence interval).

Given information

\(\hat{p} = 0.67 \longleftarrow \text{point estimate}\)
\(n = 850 \longleftarrow \text{sample size}\)
- The expected number of users who think the Facebook categorizes their interests accurately is \(850 \times 0.67 \approx 569.5\) (569 or 570).
- There around 280.5 (280 or 281) users think the opposite.
Let \(p\) bet he true population proportion and \(\hat{p}\) be the sample proportion.

The Confidence Interval

We want to find the 95% confidence interval using the formula: \[\text{point estimate} \pm 1.96 \times \text{SE}\] where SE is the standard error.

This can be written as \[ \begin{aligned} 0.67 & \pm 1.96 \times \sqrt{\frac{0.67 (1-0.67)}{850}} \\ 0.67 & \pm 1.96 \times 0.0161 \\ & \longrightarrow (0.67-0.0316,0.67+0.0316) \\ & \longrightarrow (0.6384,0.7016) \end{aligned} \]

Thus, the 95% interval for estimating the true \(p\) is between 0.6384 and 0.7016.

Case Study I: Interpretation

Which of the following is the correct interpretation of this confidence interval? We are 95% confident that:

64% to 67% of American Facebook users in this sample think Facebook categorizes their interests accurately.
64% to 67% of all American Facebook users think Facebook categorizes their interests accurately
there is a 64% to 67% chance that a randomly chosen American Facebook user’s interests are categorized accurately.
there is a 64% to 67% chance that 95% of American Facebook users’ interests are categorized accurately.

\(\star\) 64% to 67% of all American Facebook users think Facebook categorizes their interests accurately.

\(\dagger\) Why do we interpret the confidence interval this way?

What does 95% Confident Mean?

Suppose we took many samples and built a confidence interval from each sample using the equation \[\text{point estimate} \pm 1.96 \times \text{standard error}.\]

Then about 95% of those intervals would contain the true population proportion (\(p\)).

Width of an interval

If we want to be more certain that we capture the population parameter, i.e. increase our confidence level, should we use a wider interval or a smaller interval?

\(\star\) A wider interval.

Can you see any drawbacks to using a wider interval?

\(\star\) If the interval is too wide it may not be very informative.

Changing the Confidence Level

\[\text{point estimate} \pm z^{\star} \times \text{SE}.\]

In a confidence interval, \(z^{\star} \times \text{SE}\) is called the margin of error, and for a given sample, the margin of error changes as the confidence level changes.
In order to change the confidence level we need to adjust \(z^{\star}\) in the above formula.
Commonly used confidence levels in practice are 90%, 95%, 98%, and 99%.
For a 95% confidence interval, \(z^{\star} = 1.96\).
However, using the standard normal distribution, it is possible to find the appropriate \(z^{\star}\) for any confidence level.

95% Confidence Interval

Example 1

Which of the below Z scores is the appropriate \(z^{\star}\) when calculating a 99.7% confidence interval?

Z = 2.05
Z = 2.33
Z = 2.97
Z = 1.96

\(\star\) Estimating the \(z^{\star}\) can be done using the 68-95-99.7 rule. We know that \(P(-3 \le Z \le 3) \approx 0.997\). So, the closest answer is \(Z = 2.97\).

Example 1: 99.7% Confidence Interval

Finding \(z^{\star}\) Exactly

Find the \(z^{\star}\) for a 92% confidence level.

Process:

Confidence level is 0.92.
Lower tail of the \(Z \sim N(0,1)\) is \(\frac{(1-0.92)}{2} = 0.04\).
We want to find the \(z\) score that would yield a 0.04 probability.
Use the qnorm() function in R.

Using R:

cl <- 0.92 # confidence level
lt <- (1-cl)/2 # lower tail probability
qnorm(lt,0,1) # computes the z star

## [1] -1.750686

Activity: Determine and Interpret Confidence Intervals

Make sure you have a copy of the F 3/14 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
Get together with another student.
Discuss your results.
Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/

Speegle, Darrin and Clair, Bryan. (2021). Probability, statistics, and data: A fresh approach using r. Chapman; Hall/CRC. https://probstatsdata.com/

Parameter Estimation

Applied Statistics

Objectives

Previously… (1/3)