MTH-391A | Spring 2025 | University of Portland
January 31, 2025
Chaining dplyr
Verbs Using
|>
Load Packages
Define Data Frame as a Tibble
Advanced Example: The goal of this example is to
transform the iris
dataset by computing the ratio of
Petal.Length
to Sepal.Length
for observations
belonging to the “setosa” species.
Summarising by Piping Verbs
The goal in this example is to compute the mean of the
Sepal.Length
column in each category of the
Species
column.
You can represent data in multiple ways, and reshaping data frames can be useful when working with datasets that are not in an easily accessible format.
Depending on your goals, restructuring the data can:
The Titanic
Data Set: This data set
comes with base R in the datasets
package, which is not in
a familiar R data frame format or tibble
format.
## 'table' num [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
## - attr(*, "dimnames")=List of 4
## ..$ Class : chr [1:4] "1st" "2nd" "3rd" "Crew"
## ..$ Sex : chr [1:2] "Male" "Female"
## ..$ Age : chr [1:2] "Child" "Adult"
## ..$ Survived: chr [1:2] "No" "Yes"
A 4-Dimensional Array: The Titanic
is a
4-dimensional array. To fix this, you can use the
as_tibble()
function to convert this 4-dimensional array
into a data frame in tibble format.
## Rows: 32
## Columns: 5
## $ Class <chr> "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1s…
## $ Sex <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Female",…
## $ Age <chr> "Child", "Child", "Child", "Child", "Child", "Child", "Child"…
## $ Survived <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
## $ n <dbl> 0, 0, 35, 0, 0, 0, 17, 0, 118, 154, 387, 670, 4, 13, 89, 3, 5…
\(\star\) Notice here that the
titanic_tibble
data frame appears to be a summary of the
number of cases in each categorical variable.
What is Pivoting?
pivot_longer()
: Converts wide data into a long
format.pivot_wider()
: Converts long data into a wide
format.Why Pivot Data?
Class
as Columns in the
iris
TibbleThe goal is to create a table that Shows the Class
categories as Columns.
Original tibble
## Rows: 32
## Columns: 5
## $ Class <chr> "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1s…
## $ Sex <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Female",…
## $ Age <chr> "Child", "Child", "Child", "Child", "Child", "Child", "Child"…
## $ Survived <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
## $ n <dbl> 0, 0, 35, 0, 0, 0, 17, 0, 118, 154, 387, 670, 4, 13, 89, 3, 5…
Reshaped tibble
# using the subset create a new variable to save the reshaped data
titanic_wide_class <- titanic_tibble |>
# reshape the data frame as a table
pivot_wider(
names_from = "Class",
values_from = "n"
)
# show results
titanic_wide_class
## # A tibble: 8 × 7
## Sex Age Survived `1st` `2nd` `3rd` Crew
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Male Child No 0 0 35 0
## 2 Female Child No 0 0 17 0
## 3 Male Adult No 118 154 387 670
## 4 Female Adult No 4 13 89 3
## 5 Male Child Yes 5 11 13 0
## 6 Female Child Yes 1 13 14 0
## 7 Male Adult Yes 57 14 75 192
## 8 Female Adult Yes 140 80 76 20
The function of pivot_wider()
.
Figure is from Alistair Bailey (University of Southampton).
The function of pivot_wider()
.
Figure is from Alistair Bailey (University of Southampton).
Case example: We want to convert the
titanic_wide_class
back into its original form.
Reshaped tibble
## # A tibble: 8 × 7
## Sex Age Survived `1st` `2nd` `3rd` Crew
## <chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 Male Child No 0 0 35 0
## 2 Female Child No 0 0 17 0
## 3 Male Adult No 118 154 387 670
## 4 Female Adult No 4 13 89 3
## 5 Male Child Yes 5 11 13 0
## 6 Female Child Yes 1 13 14 0
## 7 Male Adult Yes 57 14 75 192
## 8 Female Adult Yes 140 80 76 20
Back to the original tibble
## Rows: 32
## Columns: 5
## $ Sex <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Female",…
## $ Age <chr> "Child", "Child", "Child", "Child", "Child", "Child", "Child"…
## $ Survived <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
## $ Class <chr> "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1s…
## $ n <dbl> 0, 0, 35, 0, 0, 0, 17, 0, 118, 154, 387, 670, 4, 13, 89, 3, 5…
\(\dagger\) The goal of the
demonstration is to use the pivot_longer()
function to
reshape it back to its original form.
.Rmd
file by replacing [name]
with your name
using the format [First name][Last initial]
. Then, open the
.Rmd
file.