Sampling Principles and Strategies

Elementary Statistics

MTH-161D | Spring 2025 | University of Portland

January 29, 2025

Objectives

These slides are derived from Diez et al. (2012).

Previously… (1/2)

Types of Variables

Types of Variables

Previously… (2/2)

The guiding principle of statistics is statistical thinking.

Statistical Thinking in the Data Science Life Cycle

Statistical Thinking in the Data Science Life Cycle

Case Study I: Population vs Sample

Research Question: Can people become better, more efficient runners on their own, merely by running?

\(\star\) The random sample taken can only be generalized for adult women, not all people.

Case Study II: Anecdotal Evidence

Research Question: Can smoking contribute to negative health outcomes?

\(\star\) Anecdotal evidence refers to information or conclusions drawn from personal experiences, individual stories, or isolated examples rather than systematic data collection or rigorous scientific analysis.

Case Study II: Early Smoking Research

Research Question: Can smoking contribute to negative health outcomes?

\(\star\) Overgeneralization is a logical fallacy where a conclusion is drawn from insufficient or unrepresentative evidence, applying it too broadly.

Census

Wouldn’t it be better to just include everyone and “sample” the entire population?

This is called a Census.

Problems with taking a census:

Sampling Bias

What is sampling bias? It occurs when the sample collected for a study or survey is not representative of the larger population that the study aims to analyze.

There are several ways sampling bias can occur, such as:

Case Study III: Sampling Bias (1/2)

NPR: Illegal Immigrants Reluctant to Fill Out Census Form

NPR: See 200 Years Of Twists And Turns Of Census Citizenship Questions

Case Study III: Sampling Bias (2/2)

US Census: Citizenship Question Effects on Household Survey Response

Exploratory Analysis to Inference

What is Exploratory Analysis? It is the process of analyzing and summarizing datasets to uncover patterns, trends, relationships, and anomalies before inference.

Inference

What is inference? It is the process of drawing conclusions about a population based on sample data. This involves using data from a sample to make generalizations, predictions, or decisions about a larger group.

Obtaining Good Samples

Simple Random Sample

What is simple random sampling? Randomly select cases from the population, where there is no implied connection between the points that are selected.

Stratified Sample

What is stratified sampling? Strataare made up of similar observations. We take a simple random sample from each stratum.

Cluster Sample

What is cluster sampling? Clusters are usually not made up of homogeneous observations. We take a simple random sample of clusters, and then sample all observations in that cluster.

Multistage Sample

What is multistage sampling? Clusters are usually not made up of homogeneous observations. We take a simple random sample of clusters, and then take a simple random sample of observations from the sampled clusters.

Case Study VI: Statified vs Clustered Sampling

Scenario: A hospital wants to survey nurse job satisfaction across departments.

Sampling Method Process Example
Stratified Divide nurses into departments (strata) and sample proportionally from each group. 8 emergency, 6 ICU, 10 pediatrics, 16 general medicine
Clustered Divide hospital into floors (clusters) and randomly select entire floors, surveying all nurses there. Select 2 random floors and survey all nurses on those floors.

Key Difference:

Activity: Identify the Sampling Method

  1. Make sure you have a copy of the W 1/29 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/