MTH-161D | Spring 2025 | University of Portland
January 29, 2025
These slides are derived from Diez et al. (2012).
Types of Variables
The guiding principle of statistics is statistical thinking.
Statistical Thinking in the Data Science Life Cycle
Research Question: Can people become better, more efficient runners on their own, merely by running?
\(\star\) The random sample taken can only be generalized for adult women, not all people.
Research Question: Can smoking contribute to negative health outcomes?
\(\star\) Anecdotal evidence refers to information or conclusions drawn from personal experiences, individual stories, or isolated examples rather than systematic data collection or rigorous scientific analysis.
Research Question: Can smoking contribute to negative health outcomes?
\(\star\) Overgeneralization is a logical fallacy where a conclusion is drawn from insufficient or unrepresentative evidence, applying it too broadly.
Wouldn’t it be better to just include everyone and “sample” the entire population?
This is called a Census.
Problems with taking a census:
What is sampling bias? It occurs when the sample collected for a study or survey is not representative of the larger population that the study aims to analyze.
There are several ways sampling bias can occur, such as:
What is Exploratory Analysis? It is the process of analyzing and summarizing datasets to uncover patterns, trends, relationships, and anomalies before inference.
What is inference? It is the process of drawing conclusions about a population based on sample data. This involves using data from a sample to make generalizations, predictions, or decisions about a larger group.
Almost all statistical methods are based on the notion of implied randomness.
Most commonly used random sampling techniques are simple, stratified, and cluster sampling.
What is simple random sampling? Randomly select cases from the population, where there is no implied connection between the points that are selected.
What is stratified sampling? Strataare made up of similar observations. We take a simple random sample from each stratum.
What is cluster sampling? Clusters are usually not made up of homogeneous observations. We take a simple random sample of clusters, and then sample all observations in that cluster.
What is multistage sampling? Clusters are usually not made up of homogeneous observations. We take a simple random sample of clusters, and then take a simple random sample of observations from the sampled clusters.
Scenario: A hospital wants to survey nurse job satisfaction across departments.
Sampling Method | Process | Example |
---|---|---|
Stratified | Divide nurses into departments (strata) and sample proportionally from each group. | 8 emergency, 6 ICU, 10 pediatrics, 16 general medicine |
Clustered | Divide hospital into floors (clusters) and randomly select entire floors, surveying all nurses there. | Select 2 random floors and survey all nurses on those floors. |
Key Difference:
.pdf
file.