MTH-161D | Spring 2025 | University of Portland
March 17, 2025
These slides are derived from Diez et al. (2012).
The guiding principle of statistics is statistical thinking.
Statistical Thinking in the Data Science Life Cycle
Types of Inference
Parameter Estimation | Hypothesis Testing | |
---|---|---|
Goal | Estimate an unknown population value | Assess claims about a population value |
Methods | Point Estimation: A single value estimate (e.g., sample
mean) Interval Estimation: A range of plausible values (e.g., confidence interval) |
State a null and an alternative hypothesis Compute a test statistic and compare it to a threshold (p-value or critical value) |
Key Concept | Focuses on precision in estimation (confidence intervals) | Focuses on decision-making based on evidence (reject or fail to reject the null hypothesis) |
Confidence Intervals
\[\text{point estimate} \pm z^{\star} \times \text{SE}.\]
Find the \(z^{\star}\) for a 92% confidence level.
Process:
qnorm()
function in R.Using R:
cl <- 0.92 # confidence level
lt <- (1-cl)/2 # lower tail probability
qnorm(lt,0,1) # computes the z star
## [1] -1.750686
Two scientists want to know if a certain drug is effective against high blood pressure.
Which is the better way to test this drug?
\(\star\) Answer: The second scientist that want 500 get the drug, 500 don’t.
The GSS (General Social Survey) asks the same question, below is the distribution of responses from the 2010 survey:
Answer | Count |
---|---|
All 1000 get the drug | 99 |
500 get the drug 500 don’t | 571 |
Total | 670 |
We would like to estimate the proportion of all Americans who have good intuition about experimental design, i.e. would answer “500 get the drug 500 don’t”? What are the parameter of interest and the point estimate?
Parameter of interest: proportion of all Americans who have good intuition about experimental design. \[p \longrightarrow \text{a population proportion}\]
Point estimate: proportion of sampled Americans who have good intuition about experimental design. \[\hat{p} \longrightarrow \text{a sample proportion}\]
What percent of all Americans have good intuition about experimental design, i.e. would answer “500 get the drug 500 don’t”?
We can answer this research question using a confidence interval, which we know is always of the form \[\text{point estimate} \pm z^{\star} \times \text{SE}.\]
Standard error (SE) of a sample proportion \[SE_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}.\]
Sample proportions will be nearly normally distributed with mean equal to the population mean, \(p\), and standard error equal to \(\sqrt{\frac{p(1-p)}{n}}\).
This is true only under certain conditions:
Note:
The GSS found that 571 out of 670 (85%) of Americans answered the question on experimental design correctly. Estimate (using a 95% confidence interval) the proportion of all Americans who have good intuition about experimental design?
Given: \(n = 670\), \(\hat{p} = 0.85\). First check conditions.
We are given that \(n = 670\), \(\hat{p} = 0.85\), we also just learned that the standard error of the sample proportion is \[SE_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}.\]
Which of the below is the correct calculation of the 95% confidence interval?
\(\star\) Answer: \(0.85 \pm 1.96 \times \sqrt{\frac{0.85 \times 0.15}{670}} \longrightarrow (0.82,0.88)\)
Previously, for \(n=670\) the margin of error is \(1.96 \times \sqrt{\frac{0.85 \times 0.15}{670}} \approx 0.027\).
How many people should you sample in order to cut the margin of error of a 95% confidence interval down to 0.01?
\[z^{\star} \times SE_{\hat{p}}\]
\[ \begin{aligned} 1.96 \times \sqrt{\frac{0.85 \times 0.15}{n}} & \le 0.01 \\ 1.96^2 \times \frac{0.85 \times 0.15}{n} & \le 0.01^2 \\ n & \ge \frac{1.96^2 \times 0.85 \times 0.15}{0.01^2} \\ n & \ge 4898.04 \end{aligned} \]
\(\star\) The sample size should be at least 4,899 to have a 0.01 margin of error for 95% confidence interval.
.pdf
file.