Point Estimates and Sampling Variability

Elementary Statistics

MTH-161D | Spring 2025 | University of Portland

March 12, 2025

Objectives

These slides are derived from Diez et al. (2012).

Previously… (1/3)

The guiding principle of statistics is statistical thinking.

Statistical Thinking in the Data Science Life Cycle

Statistical Thinking in the Data Science Life Cycle

Previously… (2/3)

Types of Inference

Parameter Estimation Hypothesis Testing
Goal Estimate an unknown population value Assess claims about a population value
Methods Point Estimation: A single value estimate (e.g., sample mean)
Interval Estimation: A range of plausible values (e.g., confidence interval)
State a null and an alternative hypothesis
Compute a test statistic and compare it to a threshold (p-value or critical value)
Key Concept Focuses on precision in estimation (confidence intervals) Focuses on decision-making based on evidence (reject or fail to reject the null hypothesis)

Previously… (3/3)

The normal r.v. \(X \sim \text{N}(\mu,\sigma^2)\) has infinite possible outcomes (or infinite sized sample space) where \(\mu\) is the mean and \(\sigma^2\) is the variance (\(\sigma\) is the standard deviation) with PDF given the continuous curve below.

The Central Limit Theorem (CLT)

Key idea the Central Limit Theorem (CLT). Image source: Medium--AI/Data Science Digest

Key idea the Central Limit Theorem (CLT). Image source: Medium–AI/Data Science Digest

\(\star\) Key Idea: CLT says that the sample mean (or sum) of many independent and identically distributed random variables approaches a normal distribution, regardless of the original distribution.

Parameter Estimation

Example 1

If we randomly sample 1,000 adults from each U.S. state, would the sample means of their heights be:

  1. very different
  2. the same
  3. not the same, but only somewhat different

\(\star\) The answer is not the same, but only somewhat different because of sampling variability.

Example 2

Suppose the proportion of American adults who support the expansion of solar energy is \(p = 0.88\).

  1. Is the provided \(p\) a population parameter or a sample statistic?
  2. Is a randomly selected American adult more or less likely to support the expansion of solar energy?

\(\star\) \(p=0.88\) is a population parameter because it is talking about all american adults. The proportion is considered a high proportion of support. Thus, a randomly selected american adult is more likely to support solar energy expansion.

Example 2: Unknown Population Parameter

Suppose that you don’t have access to the population of all American adults, which is a quite likely scenario. In order to estimate the proportion of American adults who support solar power expansion, you might sample from the population and use your sample proportion as the best guess for the unknown population proportion.

\(\star\) Key Idea: After many repeated sampling of the same process as described, the resulting distribution of proportions will be normal.

Example 2: Point Estimate

\(\dagger\) Based on this distribution, what do you think is the true population proportion?

Sampling Distributions

Sampling distributions are never observed

\(\star\) Key Idea: Understanding the sampling distribution will help us characterize and make sense of the point estimates that we do observe.

The Normal Distribution (Revisited)

The normal r.v. \(X \sim \text{N}(\mu,\sigma^2)\) has infinite possible outcomes (or infinite sized sample space) where \(\mu\) is the mean and \(\sigma^2\) is the variance (\(\sigma\) is the standard deviation) with PDF given the continuous curve below.

The 68-95-99.7 Rule (1/3)

1st standard deviation from the mean

\[P(\mu - \sigma \le X \le \mu + \sigma) \approx 0.68\]

The 68-95-99.7 Rule (2/3)

2nd standard deviation from the mean

\[P(\mu - 2\sigma \le X \le \mu + 2\sigma) \approx 0.95\]

The 68-95-99.7 Rule (3/3)

3rd standard deviation from the mean

\[P(\mu - 3\sigma \le X \le \mu + 3\sigma) \approx 0.997\]

Total Area Under the Curve

The Normal PDF satisfies the probability axioms

\[P(\mu - \infty \le X \le \mu + \infty) \approx 1\]

\(\star\) Key Idea: Because of the axiom that the sum of the probabilities for all outcomes in the sample space is equal to 1, the total area under the Normal PDF is always 1.

Standard Normal Distribution (1/2)

The standard normal distribution is when \(\mu=0\) and \(s=1\) or \(Z \sim \text{N}(0,1)\).

The transformation formula (the z-score)

Standardized scores that measure how many standard deviations a value is from the mean. \[Z = \frac{X - \mu}{\sigma}\]

Standard Normal Distribution (2/2)

The standard normal distribution, \(Z \sim \text{N}(0,1)\).

\(\star\) Key Idea: The standard normal distribution is that it is a normal distribution with a mean of 0 and a standard deviation of 1. It serves as a reference distribution, allowing any normally distributed variable to be standardized.

CLT Conditions

Extending the Framework for other Descriptive Statistics

Example:

Take a random sample of students at a college and ask them how many extracurricular activities they are involved in to estimate the average number (or median number) of extra curricular activities all students in this college are interested in.

\(\star\) Key Idea: The principles and general ideas of CLT apply to other parameters as well, even if the details change a little.

Activity: Understanding the CLT

  1. Make sure you have a copy of the W 3/12 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/