Reshaping One Table

Fundamentals of Data Science

MTH-391A | Spring 2025 | University of Portland

January 31, 2025

Objectives

Previously… (1/2)

Chaining dplyr Verbs Using |>

Load Packages

library(tidyverse)

Define Data Frame as a Tibble

iris_tibble <- tibble(iris)

Advanced Example: The goal of this example is to transform the iris dataset by computing the ratio of Petal.Length to Sepal.Length for observations belonging to the “setosa” species.

iris_tibble |>  
  # rule 1: choose only the "setosa" species
  filter(Species == "setosa") |>  
  # rule 2: pick the columns Sepal.Length and Petal.Length
  select(Sepal.Length,Petal.Length) |>  
  # rule 3: create a new column called length_ratio
  mutate(length_ratio = Petal.Length/Sepal.Length)

Previously… (2/2)

Summarising by Piping Verbs

The goal in this example is to compute the mean of the Sepal.Length column in each category of the Species column.

iris_tibble |> 
  # Step 1: group by species
  group_by(Species) |> 
  # Step 2: Calculate the mean of the Sepal.Length column
  #  - mean_sepal_length is the new column for the calculated mean
  summarise(mean_sepal_length = mean(Sepal.Length))

Reshaping Data Frames

You can represent data in multiple ways, and reshaping data frames can be useful when working with datasets that are not in an easily accessible format.

Depending on your goals, restructuring the data can:

Example: Working with Unfamiliar Data Frame Structure

The Titanic Data Set: This data set comes with base R in the datasets package, which is not in a familiar R data frame format or tibble format.

glimpse(Titanic)
##  'table' num [1:4, 1:2, 1:2, 1:2] 0 0 35 0 0 0 17 0 118 154 ...
##  - attr(*, "dimnames")=List of 4
##   ..$ Class   : chr [1:4] "1st" "2nd" "3rd" "Crew"
##   ..$ Sex     : chr [1:2] "Male" "Female"
##   ..$ Age     : chr [1:2] "Child" "Adult"
##   ..$ Survived: chr [1:2] "No" "Yes"

A 4-Dimensional Array: The Titanic is a 4-dimensional array. To fix this, you can use the as_tibble() function to convert this 4-dimensional array into a data frame in tibble format.

titanic_tibble <- as_tibble(Titanic)
glimpse(titanic_tibble)
## Rows: 32
## Columns: 5
## $ Class    <chr> "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1s…
## $ Sex      <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Female",…
## $ Age      <chr> "Child", "Child", "Child", "Child", "Child", "Child", "Child"…
## $ Survived <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
## $ n        <dbl> 0, 0, 35, 0, 0, 0, 17, 0, 118, 154, 387, 670, 4, 13, 89, 3, 5…

\(\star\) Notice here that the titanic_tibble data frame appears to be a summary of the number of cases in each categorical variable.

Pivoting

What is Pivoting?

Why Pivot Data?

Example: Use the Class as Columns in the iris Tibble

The goal is to create a table that Shows the Class categories as Columns.

Original tibble

glimpse(titanic_tibble)
## Rows: 32
## Columns: 5
## $ Class    <chr> "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1s…
## $ Sex      <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Female",…
## $ Age      <chr> "Child", "Child", "Child", "Child", "Child", "Child", "Child"…
## $ Survived <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
## $ n        <dbl> 0, 0, 35, 0, 0, 0, 17, 0, 118, 154, 387, 670, 4, 13, 89, 3, 5…

Reshaped tibble

# using the subset create a new variable to save the reshaped data
titanic_wide_class <- titanic_tibble |> 
  # reshape the data frame as a table
  pivot_wider(
    names_from = "Class",
    values_from = "n"
  )
# show results
titanic_wide_class
## # A tibble: 8 × 7
##   Sex    Age   Survived `1st` `2nd` `3rd`  Crew
##   <chr>  <chr> <chr>    <dbl> <dbl> <dbl> <dbl>
## 1 Male   Child No           0     0    35     0
## 2 Female Child No           0     0    17     0
## 3 Male   Adult No         118   154   387   670
## 4 Female Adult No           4    13    89     3
## 5 Male   Child Yes          5    11    13     0
## 6 Female Child Yes          1    13    14     0
## 7 Male   Adult Yes         57    14    75   192
## 8 Female Adult Yes        140    80    76    20

Widening Data Frames

The function of `pivot_wider()`.

The function of pivot_wider().

Figure is from Alistair Bailey (University of Southampton).

Longing Data Frames

The function of `pivot_wider()`.

The function of pivot_wider().

Figure is from Alistair Bailey (University of Southampton).

In-Class Demonstrations

Case example: We want to convert the titanic_wide_class back into its original form.

Reshaped tibble

## # A tibble: 8 × 7
##   Sex    Age   Survived `1st` `2nd` `3rd`  Crew
##   <chr>  <chr> <chr>    <dbl> <dbl> <dbl> <dbl>
## 1 Male   Child No           0     0    35     0
## 2 Female Child No           0     0    17     0
## 3 Male   Adult No         118   154   387   670
## 4 Female Adult No           4    13    89     3
## 5 Male   Child Yes          5    11    13     0
## 6 Female Child Yes          1    13    14     0
## 7 Male   Adult Yes         57    14    75   192
## 8 Female Adult Yes        140    80    76    20

Back to the original tibble

## Rows: 32
## Columns: 5
## $ Sex      <chr> "Male", "Male", "Male", "Male", "Female", "Female", "Female",…
## $ Age      <chr> "Child", "Child", "Child", "Child", "Child", "Child", "Child"…
## $ Survived <chr> "No", "No", "No", "No", "No", "No", "No", "No", "No", "No", "…
## $ Class    <chr> "1st", "2nd", "3rd", "Crew", "1st", "2nd", "3rd", "Crew", "1s…
## $ n        <dbl> 0, 0, 35, 0, 0, 0, 17, 0, 118, 154, 387, 670, 4, 13, 89, 3, 5…

\(\dagger\) The goal of the demonstration is to use the pivot_longer() function to reshape it back to its original form.

Activity: Reshape Data by Category

  1. Log-in to Posit Cloud and open the R Studio assignment MA5: Reshape Data by Category.
  2. Make sure you are in the current working directory. Rename the .Rmd file by replacing [name] with your name using the format [First name][Last initial]. Then, open the .Rmd file.
  3. Change the author in the YAML header.
  4. Read the provided instructions.
  5. Answer all exercise problems on the designated sections.