Point Estimates and Sampling Variability

Elementary Statistics

MTH-161D | Spring 2025 | University of Portland

March 12, 2025

Objectives

Introduce the Central Limit Theorem (CLT)
Know how to determine a point estimate
Develop an understanding of the sampling distribution
Activity: Understanding the CLT

These slides are derived from Diez et al. (2012).

Previously… (1/3)

The guiding principle of statistics is statistical thinking.

Statistical Thinking in the Data Science Life Cycle

Previously… (2/3)

Types of Inference

	Parameter Estimation	Hypothesis Testing
Goal	Estimate an unknown population value	Assess claims about a population value
Methods	Point Estimation: A single value estimate (e.g., sample mean) Interval Estimation: A range of plausible values (e.g., confidence interval)	State a null and an alternative hypothesis Compute a test statistic and compare it to a threshold (p-value or critical value)
Key Concept	Focuses on precision in estimation (confidence intervals)	Focuses on decision-making based on evidence (reject or fail to reject the null hypothesis)

Previously… (3/3)

The normal r.v. \(X \sim \text{N}(\mu,\sigma^2)\) has infinite possible outcomes (or infinite sized sample space) where \(\mu\) is the mean and \(\sigma^2\) is the variance (\(\sigma\) is the standard deviation) with PDF given the continuous curve below.

The Central Limit Theorem (CLT)

Key idea the Central Limit Theorem (CLT). Image source: Medium–AI/Data Science Digest

\(\star\) Key Idea: CLT says that the sample mean (or sum) of many independent and identically distributed random variables approaches a normal distribution, regardless of the original distribution.

Parameter Estimation

We are often interested in population parameters.
Since complete populations are difficult (or impossible) to collect data on, we use sample statistics as point estimates for the unknown population parameters of interest.
Sample statistics vary from sample to sample.
Quantifying how sample statistics vary provides a way to estimate the margin of error associated with our point estimate.

Example 1

If we randomly sample 1,000 adults from each U.S. state, would the sample means of their heights be:

very different
the same
not the same, but only somewhat different

\(\star\) The answer is not the same, but only somewhat different because of sampling variability.

Example 2

Suppose the proportion of American adults who support the expansion of solar energy is \(p = 0.88\).

Is the provided \(p\) a population parameter or a sample statistic?
Is a randomly selected American adult more or less likely to support the expansion of solar energy?

\(\star\) \(p=0.88\) is a population parameter because it is talking about all american adults. The proportion is considered a high proportion of support. Thus, a randomly selected american adult is more likely to support solar energy expansion.

Example 2: Unknown Population Parameter

Suppose that you don’t have access to the population of all American adults, which is a quite likely scenario. In order to estimate the proportion of American adults who support solar power expansion, you might sample from the population and use your sample proportion as the best guess for the unknown population proportion.

Sample, with replacement, 1000 American adults from the population, and record whether they support solar power or not expansion.
Find the sample proportion.
Plot the distribution of the sample proportions.

\(\star\) Key Idea: After many repeated sampling of the same process as described, the resulting distribution of proportions will be normal.

Example 2: Point Estimate

\(\dagger\) Based on this distribution, what do you think is the true population proportion?

Sampling Distributions

Sampling distributions are never observed

In real-world applications, we never actually observe the sampling distribution, yet it is useful to always think of a point estimate as coming from such a hypothetical distribution.

\(\star\) Key Idea: Understanding the sampling distribution will help us characterize and make sense of the point estimates that we do observe.

The Normal Distribution (Revisited)

The 68-95-99.7 Rule (1/3)

1st standard deviation from the mean

\[P(\mu - \sigma \le X \le \mu + \sigma) \approx 0.68\]

The 68-95-99.7 Rule (2/3)

2nd standard deviation from the mean

\[P(\mu - 2\sigma \le X \le \mu + 2\sigma) \approx 0.95\]

The 68-95-99.7 Rule (3/3)

3rd standard deviation from the mean

\[P(\mu - 3\sigma \le X \le \mu + 3\sigma) \approx 0.997\]

Total Area Under the Curve

The Normal PDF satisfies the probability axioms

\[P(\mu - \infty \le X \le \mu + \infty) \approx 1\]

\(\star\) Key Idea: Because of the axiom that the sum of the probabilities for all outcomes in the sample space is equal to 1, the total area under the Normal PDF is always 1.

Standard Normal Distribution (1/2)

The standard normal distribution is when \(\mu=0\) and \(s=1\) or \(Z \sim \text{N}(0,1)\).

The transformation formula (the z-score)

Standardized scores that measure how many standard deviations a value is from the mean. \[Z = \frac{X - \mu}{\sigma}\]

Standard Normal Distribution (2/2)

The standard normal distribution, \(Z \sim \text{N}(0,1)\).

\(\star\) Key Idea: The standard normal distribution is that it is a normal distribution with a mean of 0 and a standard deviation of 1. It serves as a reference distribution, allowing any normally distributed variable to be standardized.

CLT Conditions

Independence – Sample values must be independent
Identical Distribution – Variables should be from the same distribution
Finite Variance – The population must have a finite variance
Large Sample Size – A larger sample size improves approximation

Extending the Framework for other Descriptive Statistics

The strategy of using a sample statistic to estimate a parameter is quite common, and it’s a strategy that we can apply to other statistics besides a proportion.

Example:

Take a random sample of students at a college and ask them how many extracurricular activities they are involved in to estimate the average number (or median number) of extra curricular activities all students in this college are interested in.

\(\star\) Key Idea: The principles and general ideas of CLT apply to other parameters as well, even if the details change a little.

Activity: Understanding the CLT

Make sure you have a copy of the W 3/12 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
Get together with another student.
Discuss your results.
Submit your worksheet on Moodle as a .pdf file.