Basics of Visualizations

Applied Statistics

MTH-361A | Spring 2026 | University of Portland

Objectives

tidyverse Core Packages for Data Visualizations

tidyverse is a collection of packages suited for data processing and visualization.

Core packages specifically for data visualizations:

  • ggplot2 is a system for creating graphics, where you provide the data, specify how to map variables to plots, and it handles the details.

\(\star\) Make sure that you load the tidyverse package before you use any functions in ggplot2.

library(tidyverse)

Data Visualization Using ggplot2

What is ggplot2?

Why use ggplot2?

The Grammar for Graphics

What is the Grammar of Graphics?

Key Components of ggplot2

Example: iris Data Set Scatter Plots

Convert dataframe into tibble

# define dataframe as tibble
iris_tibble <- tibble(iris)

Plotting iris lengths

# establish data and variables
ggplot(
    # dataframe
    iris_tibble,
    # aesthetics
    aes(
      # x-axis using a numerical variable
      x = Sepal.Length,
      # y-axis using a numerical variable
      y = Petal.Length
      )
    ) +
  # draw scatter plot
  geom_point()

Layered Approach

\(\star\) Note that the + operator here is used to “add” a layer, not adding numbers.

Aesthetics

The aes() function maps data variables to visual properties like position, color, size, and shape.

Plotting iris lengths by species

# establish data and variables
ggplot(
    # dataframe
    iris_tibble,
    # aesthetics
    aes(
      # x-axis using a numerical variable
      x = Sepal.Length,
      # y-axis using a numerical variable
      y = Petal.Length,
      # color each point using a categorical variable
      color = Species
      )
    ) +
  # draw scatter plot
  geom_point()

Common Aesthetics Mappings

\(\star\) Note that the aes() function is called within the ggplot() function as the second argument.

Layering

Using the + operator allows us to add layers to the plot, which is used for customizing the plot or adding more information.

Plotting iris lengths by species with regression lines

# establish data and variables
ggplot(
    # dataframe
    iris_tibble,
    # aesthetics
    aes(
      # x-axis using a numerical variable
      x = Sepal.Length,
      # y-axis using a numerical variable
      y = Petal.Length,
      # color each point using a categorical variable
      color = Species,
      # color each point using a categorical variable
      group = Species
      )
    ) +
  # draw scatter plot
  geom_point() + 
  # add regression lines
  geom_smooth(
    # define model
    method = 'lm', formula='y~x',
    # define color
    color = "black"
    )

Layering to add more information

\(\star\) Key Idea: All subsequent layers will inherit all information of the aes() variables defined in the ggplot() function.

Common Geometries for Numerical Variables

Geom Function
geom_point() Scatter plot for visualizing relationships between two numerical variables.
geom_line() Line plot for trends over time or continuous sequences.
geom_histogram() Histogram for visualizing the distribution of a single numerical variable.
geom_dotplot() Shows each dot representing one observation in a distribution.
geom_boxplot() Box plot for showing distributions and detecting outliers.

\(\star\) Be careful when defining variables in aes(). For example, geom_histogram() only requires an x-axis variable, as it plots the distribution of a single numerical variable.

Example: iris Data Set Histogram

Plotting the distribution of iris sepal lengths by species

# establish data and variables
ggplot(
    # dataframe
    iris_tibble,
    # aesthetics
    aes(
      # x-axis using a numerical variable
      x = Sepal.Length,
      # fill bars using a categorical variable
      fill = Species
      )
    ) +
  # draw histogram
  geom_histogram(
    # define number of bins
    bins=10
    )

\(\star\) The bins parameter in the geom_histogram() function allows you to adjust the number of bins, affecting how the data is visualized.