Study Design and Inference

Applied Statistics

MTH-361A | Spring 2026 | University of Portland

Objectives

Relationship Among Variables

Research Question: Does there appear to be a relationship between the hours of study per week and the GPA of a student?

\(\star\) As hours of study increases, the GPA also increases but for study hours around 0 to 30 hours, there is a lot of variation. There is one student with GPA > 4.0, this is likely a data error.

Explanatory vs Response Variables

\[\text{explanatory variable} \xrightarrow{\text{might affect}} \text{response variable}\]

Associated vs Independent Variables

Population vs Sample

Research Question: Can people become better, more efficient runners on their own, merely by running?

\(\star\) The random sample taken can only be generalized for adult women, not all people.

Anecdotal Evidence

Research Question: Can smoking contribute to negative health outcomes?

\(\star\) Anecdotal evidence refers to information or conclusions drawn from personal experiences, individual stories, or isolated examples rather than systematic data collection or rigorous scientific analysis.

Early Smoking Research

Research Question: Can smoking contribute to negative health outcomes?

\(\star\) Overgeneralization is a logical fallacy where a conclusion is drawn from insufficient or unrepresentative evidence, applying it too broadly.

Census

Wouldn’t it be better to just include everyone and “sample” the entire population?

This is called a Census.

Problems with taking a census:

Sampling Bias (1/3)

What is sampling bias? It occurs when the sample collected for a study or survey is not representative of the larger population that the study aims to analyze.

There are several ways sampling bias can occur, such as:

Sampling Bias (2/3)

NPR: Illegal Immigrants Reluctant to Fill Out Census Form

NPR: See 200 Years Of Twists And Turns Of Census Citizenship Questions

Sampling Bias (3/3)

US Census: Citizenship Question Effects on Household Survey Response

Exploratory Analysis to Inference

What is Exploratory Analysis? It is the process of analyzing and summarizing datasets to uncover patterns, trends, relationships, and anomalies before inference.

Inference

What is inference? It is the process of drawing conclusions about a population based on sample data. This involves using data from a sample to make generalizations, predictions, or decisions about a larger group.

Types of Inference

Parameter Estimation Hypothesis Testing
Goal Estimate an unknown population value Assess claims about a population value
Methods Point Estimation: A single value estimate (e.g., sample mean)
Interval Estimation: A range of plausible values (e.g., confidence interval)
State a null and an alternative hypothesis
Compute a test statistic and compare it to a threshold (p-value or critical value)
Key Concept Focuses on precision in estimation (confidence intervals) Focuses on decision-making based on evidence (reject or fail to reject the null hypothesis)

\(\star\) Parameter estimation focuses on finding the best estimate of an unknown population value, while hypothesis testing determines whether there is enough evidence to support or reject a claim about the population.

Obtaining Good Samples

Simple Random Sample

What is simple random sampling? Randomly select cases from the population, where there is no implied connection between the points that are selected.

Stratified Sample

What is stratified sampling? Strata are made up of similar observations. We take a simple random sample from each stratum.

Cluster Sample

What is cluster sampling? Clusters are usually not made up of homogeneous observations. We take a simple random sample of clusters, and then sample all observations in that cluster.

Multistage Sample

What is multistage sampling? Clusters are usually not made up of homogeneous observations. We take a simple random sample of clusters, and then take a simple random sample of observations from the sampled clusters.

Statified vs Clustered Sampling

Scenario: A hospital wants to survey nurse job satisfaction across departments.

Sampling Method Process Example
Stratified Divide nurses into departments (strata) and sample proportionally from each group. 8 emergency, 6 ICU, 10 pediatrics, 16 general medicine
Clustered Divide hospital into floors (clusters) and randomly select entire floors, surveying all nurses there. Select 2 random floors and survey all nurses on those floors.

\(\star\) Stratified sampling ensures proportional representation from all departments, while clustered sampling is more practical but may miss some groups.

Types of Studies

Observational Experimental
Researchers observe subjects without interference. Researchers intervene by applying treatments to subjects.
No treatment or manipulation is imposed. Includes a control and treatment groups with random assignments (ideally).
Used to find associations, not causation. Can determine causal relationships.

\(\star\) Observational studies find patterns, while experimental studies test cause-and-effect.

Smoking and Lung Cancer

Research Question: Is there a relationship between smoking and lung cancer?

Study Design:

Findings:

\(\star\) Since this is observational, it cannot prove smoking causes lung cancer –other factors (e.g., genetics, pollution) may also contribute. However, strong correlations from multiple studies can strengthen this conclusion.

Types of Observational Studies

Aspect Case-Control Cohort (Longitudinal) Cross-Sectional
Study Design Compares individuals with a condition (cases) to those without (controls). Follows groups of individuals over time, observing exposures and outcomes. Measures a population at a single point in time, observing various variables.
Main Focus Identifying exposures or risk factors associated with an outcome. Observing how exposures lead to outcomes over time. Examining the prevalence of variables or conditions at a given time.
Temporal Sequence Retrospective –looks back in time to find past exposures. Prospective –follows participants forward in time. No temporal sequence – snapshot of a population at a single time point.
Data Collection Collects past data (often using medical records or interviews). Collects data over time, often requiring repeated observations or surveys. Collects data at one point in time.

\(\star\) Case-Control looks at data in the past, Cohort follows the data, and Cross-Sectional looks at data at one point in time.

Strengths and Limitations of Observational Studies

Aspect Case-Control Cohort (Longitudinal) Cross-Sectional
Strengths Good for studying rare diseases, cost-effective, relatively quick. Can establish temporal relationships, good for studying causes and effects. Quick, inexpensive, good for identifying associations.
Limitations Cannot establish causality, relies on recall bias. Expensive, time-consuming, and prone to participant attrition. Cannot determine causality, only associations.

\(\star\) The limitation of observational studies is that it can not determine causality, only associations.

Prospective vs Retrospective Observational Studies

Study Type Description Strengths Limitations
Prospective Study Researchers follow subjects forward in time, starting with an exposure and observing future outcomes. Can establish a temporal relationship between exposure and outcome, reduces recall bias. Expensive, time-consuming, potential participant dropout.
Retrospective Study Researchers analyze past data, identifying subjects with an outcome and looking back to determine exposure. Quick, cost-effective, useful for rare diseases or long-term effects. Prone to recall bias, missing or incomplete data, cannot establish causality.

\(\star\) Prospective means present and future data and retrospective means the past data.

Hypertension and Stroke Incidence

Research Question: Is there a relationship between hypertension and stroke incidence in an older population?

Study Design:

Findings:

\(\star\) This is an example of a retrospective cohort Study because the data is in the past and the design involves groups.

Energy Gels

Research Question: Does energy gels make a person run faster?

Study Design:

Findings:

\(\star\) This is an example of an experimental study because the design involves an intervention, which is the treatment group (with intervention) and compared it to the control group (without intervention).

Blocking

Since it is suspected that energy gels might affect pro and amateur athletes differently, we block for pro status.

Study Design:

Findings:

\(\dagger\) Why is is blocking important? Can you think of other variables to block for?

\(\star\) Since this is an experimental study, we can conclude a causal relationship between use of energy gels and faster running.

Principles of Experimental Design

Principle Description
Control Compare treatment of interest to a control group.
Randomize Randomly assign subjects to treatments, and randomly sample from the population whenever possible.
Replicate Within a study, replicate by collecting a sufficiently large sample. Or replicate the entire study.
Block If there are variables that are known or suspected to affect the response variable, first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups.

\(\star\) Experimental studies establish a cause-and-effect relationship by manipulating independent variables and observing their impact on dependent variables while controlling for confounding factors.

Blocking vs Explanatory Variables

Aspect Blocking Explanatory
Definition Characteristics that experimental units come with and that we want to control for. Variables that we manipulate or observe to explain the outcome of the experiment.
Purpose Used to reduce variability by grouping experimental units with similar traits. Used to explore or test the effect of a treatment or intervention on outcomes.
Role in Experiment Serve as a way to control for potential confounders and reduce bias. Act as the independent variable(s) whose effect on the dependent variable is tested.
Timing in Experiment Applied before random assignment to ensure balanced groups. Manipulated or measured during the experiment to observe their effect.

\(\star\) Explanatory variables are factors tested for their impact, while blocking groups subjects to reduce confounding effects.

More Experimental Design Terminology

Random Assignment vs Random Sampling