MTH-361A | Spring 2026 | University of Portland
ggplot2
packagetidyverse Core Packages for Data Visualizationstidyverse is a collection of packages suited for data
processing and visualization.
Core packages specifically for data visualizations:
ggplot2 is a system for creating graphics, where you
provide the data, specify how to map variables to plots, and it handles
the details.\(\star\) Make sure that you load
the tidyverse package before you use any functions in
ggplot2.
ggplot2What is ggplot2?
ggplot2 is a powerful R package designed for data
visualizations.tidyverse ecosystem.+.Why use ggplot2?
tidyverse packages such as dplyr for data
wrangling.What is the Grammar of Graphics?
ggplot2 package in R is based on this framework,
allowing for a highly customizable and layered approach to data
visualization.ggplot2Data: The dataset being visualized in tibble form.
Aesthetics (aes): The mapping of data variables to visual properties like position, color, size, and shape.
Geometries (geom): The type of plot (e.g., points, lines, bars) that represents the data.
Facets: Splitting data into multiple panels for comparison.
Statistics (stat): Computations applied to the data before plotting (e.g., smoothing, binning).
Coordinates (coord): The system defining how data is mapped onto the plot (e.g., Cartesian, polar).
Themes: Controls the overall appearance of the plot, such as background color, grid lines, and fonts.
iris Data Set Scatter PlotsConvert dataframe into tibble
Plotting iris lengths
# establish data and variables
ggplot(
# dataframe
iris_tibble,
# aesthetics
aes(
# x-axis using a numerical variable
x = Sepal.Length,
# y-axis using a numerical variable
y = Petal.Length
)
) +
# draw scatter plot
geom_point()Layered Approach
ggplot(data, aes(...))
defines the dataset and variables.+ geom_*() specifies
the type of plot.+ facet_*(), + coord_*(), + theme_*() enhance the
visualization.\(\star\) Note that the
+ operator here is used to “add” a layer, not adding
numbers.
The aes() function maps data variables to visual
properties like position, color, size, and shape.
Plotting iris lengths by species
# establish data and variables
ggplot(
# dataframe
iris_tibble,
# aesthetics
aes(
# x-axis using a numerical variable
x = Sepal.Length,
# y-axis using a numerical variable
y = Petal.Length,
# color each point using a categorical variable
color = Species
)
) +
# draw scatter plot
geom_point()Common Aesthetics Mappings
x and y: Map variables to
the horizontal and vertical axes.color: Assign colors to different
categories.size: Control the size of points or
lines based on a variable.shape: Change the shape of points
according to a categorical variable.fill: Fill color of geometric objects
of different categories.\(\star\) Note that the
aes() function is called within the ggplot()
function as the second argument.
Using the + operator allows us to add layers to the
plot, which is used for customizing the plot or adding more
information.
Plotting iris lengths by species with regression
lines
# establish data and variables
ggplot(
# dataframe
iris_tibble,
# aesthetics
aes(
# x-axis using a numerical variable
x = Sepal.Length,
# y-axis using a numerical variable
y = Petal.Length,
# color each point using a categorical variable
color = Species,
# color each point using a categorical variable
group = Species
)
) +
# draw scatter plot
geom_point() +
# add regression lines
geom_smooth(
# define model
method = 'lm', formula='y~x',
# define color
color = "black"
)Layering to add more information
geom_smooth()geom_smooth() fits a regression line to each group,
then adds this line as a layer on the plot.\(\star\) Key Idea:
All subsequent layers will inherit all information of the
aes() variables defined in the ggplot()
function.
| Geom | Function |
|---|---|
geom_point() |
Scatter plot for visualizing relationships between two numerical variables. |
geom_line() |
Line plot for trends over time or continuous sequences. |
geom_histogram() |
Histogram for visualizing the distribution of a single numerical variable. |
geom_dotplot() |
Shows each dot representing one observation in a distribution. |
geom_boxplot() |
Box plot for showing distributions and detecting outliers. |
\(\star\) Be careful when defining
variables in aes(). For example,
geom_histogram() only requires an x-axis variable, as it
plots the distribution of a single numerical variable.
iris Data Set HistogramPlotting the distribution of iris sepal lengths
by species
# establish data and variables
ggplot(
# dataframe
iris_tibble,
# aesthetics
aes(
# x-axis using a numerical variable
x = Sepal.Length,
# fill bars using a categorical variable
fill = Species
)
) +
# draw histogram
geom_histogram(
# define number of bins
bins=10
)\(\star\) The bins
parameter in the geom_histogram() function allows you to
adjust the number of bins, affecting how the data is visualized.