Examining Numerical Data

Elementary Statistics

MTH-161D | Spring 2025 | University of Portland

February 5, 2025

Objectives

These slides are derived from Diez et al. (2012).

Previously… (1/3)

Types of Variables

Types of Variables

Previously… (2/3)

Exploratory Analysis

It is the process of analyzing and summarizing datasets to uncover patterns, trends, relationships, and anomalies before inference.

Descriptive statistics

It involves organizing, summarizing, and presenting data in an informative way. It Focuses on describing and understanding the main features of a dataset.

For Numerical Variables

For Categorical Variables

Previously… (3/3)

Inference

It is the process of drawing conclusions about a population based on sample data. This involves using data from a sample to make generalizations, predictions, or decisions about a larger group.

Scatterplots

Scatterplots are useful for visualizing the relationship between two numerical variables.

http://www.gapminder.org/world

Dot Plots

Dot plots are useful for visualizing one numerical variable. Darker colors represent areas where there are more observations.

How would you describe the distribution of GPAs in this data set?

Dot Plots and the Mean

The mean, also called the average (marked with a triangle in the above plot), is one way to measure the center of a distribution of data.

The mean GPA is 3.59.

Stacked Dot Plots

Higher bars represent areas where there are more observations, makes it a little easier to judge the center and the shape of the distribution.

Histograms

Histograms provide a view of the data density. Higher bars represent where the data are relatively more common.

Bin Width of Histograms

Which one(s) of these histograms are useful? Which reveal too much about the data? Which hide too much?

Distribution Shapes: Modality

Does the histogram have a single prominent peak (unimodal), several prominent peaks (bimodal/multimodal), or no apparent peaks (uniform)?

\(\star\) Note: In order to determine modality, step back and imagine a smooth curve over the histogram – imagine that the bars are wooden blocks and you drop a limp spaghetti over them, the shape the spaghetti would take could be viewed as a smooth curve.

Distribution Shapes: Skewness

Is the histogram right skewed, left skewed, or symmetric?

\(\star\) Note: Histograms are said to be skewed to the side of the long tail.

Commonly Observed Distribution Shapes

Measures of Central Tendency

The measures of central tendency describe the central or typical value of a dataset, summarizing its distribution. The following are the common measures of central tendency:

Skewness and Measures Central Tendency (1/3)

Mode \(<\) Median \(<\) Mean

Skewness and Measures Central Tendency (2/3)

Mean \(<\) Median \(<\) Mode

Skewness and Measures Central Tendency (3/3)

Mean \(=\) Median \(=\) Mode

Activity: Identify the Shape of Distribution

  1. Make sure you have a copy of the W 2/5 Worksheet. This will be handed out physically and it is also digitally available on Moodle.
  2. Work on your worksheet by yourself for 10 minutes. Please read the instructions carefully. Ask questions if anything need clarifications.
  3. Get together with another student.
  4. Discuss your results.
  5. Submit your worksheet on Moodle as a .pdf file.

References

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2012). OpenIntro statistics (4th ed.). OpenIntro. https://www.openintro.org/book/os/