MTH-361A | Spring 2025 | University of Portland
March 17, 2025
The guiding principle of statistics is statistical thinking.
Statistical Thinking in the Data Science Life Cycle
Types of Inference
Parameter Estimation | Hypothesis Testing | |
---|---|---|
Goal | Estimate an unknown population value | Assess claims about a population value |
Methods | Point Estimation: A single value estimate (e.g., sample
mean) Interval Estimation: A range of plausible values (e.g., confidence interval) |
State a null and an alternative hypothesis Compute a test statistic and compare it to a threshold (p-value or critical value) |
Key Concept | Focuses on precision in estimation (confidence intervals) | Focuses on decision-making based on evidence (reject or fail to reject the null hypothesis) |
Confidence Intervals
\[\text{point estimate} \pm z^{\star} \times \text{SE}.\]
Find the \(z^{\star}\) for a 92% confidence level.
Process:
qnorm()
function in R.Using R:
cl <- 0.92 # confidence level
lt <- (1-cl)/2 # lower tail probability
qnorm(lt,0,1) # computes the z star
## [1] -1.750686
Hypothesis testing is a statistical method used to make inferences about a population based on a sample. It helps determine if an observed effect is statistically significant.
Key Concepts:
Decision Rule:
Why is Hypothesis Testing Important?
Scenario:
A pharmaceutical company tests whether a new drug improves recovery rates compared to a placebo.
Test Results:
Conclusion:
There are two possible outcomes of the hypothesis test:
Reject \(H_0\): If the p-value is less than the significance level, then we reject the null hypothesis. Then, we have enough evidence to support \(H_A\).
Fail to Reject \(H_0\): If the p-value is greater than or equal to the significance level, then we fail to reject the null hypothesis. This does not mean the the null hypothesis is true.
Making statistical decisions means that you have to deal with uncertainties.
Image Source: Statistical Performance Measures by Neeraj Kumar Vaid
This meme might be over used. If you find some memes similar to this but in “non-pregnancy” context, let me know.
What does this all mean? When the p-value is small, i.e., less than a previously set threshold (\(\alpha\)), we say the results are statistically significant. The value of \(\alpha\) represents how rare an event needs to be in order for the null hypothesis to be rejected. The \(\alpha\) also represents the probability of committing a type I error.
Reality/Decision | Reject \(H_0\) | Fail to reject \(H_0\) |
---|---|---|
\(H_0\) is true | Type I error with probability \(\alpha\) (significance level) |
Correct decision with probability \(1-\alpha\) (confidence level) |
\(H_0\) is false | Correct decision with probability \(1-\beta\) (power of test) |
Type II error with probability \(\beta\) |
Conclusion errors: Type I error (false positive) or Type II error (false negative)
Images Source: Type I and Type II errors by Pritha Bhandari
A Type I error occurs when the null hypothesis is incorrectly rejected, leading to a wrongful conviction.
This means that an innocent person is found guilty and sentenced, possibly facing imprisonment or even capital punishment. The consequences extend beyond the individual, affecting their family, reputation, and future opportunities. Additionally, the real perpetrator remains free, potentially committing further crimes.
A Type II error occurs when the null hypothesis was failed to reject, leading to a wrongful acquittal.
This means that a guilty person is found not guilty and released. As a result, justice is not served for the victims, and the criminal may go on to commit additional offenses, putting society at risk. This error can undermine public trust in the legal system, as it fails to hold the guilty accountable.
.pdf
file.