Wrangling Temporal Data

Fundamentals of Data Science

MTH-391A | Spring 2025 | University of Portland

March 12, 2025

Objectives

What is Temporal Data?

Temporal data refers to data that is associated with time, meaning it captures changes, trends, or patterns over a period.

Key Characteristics:

Case Study I

New York City Flights

Load Packages

library(nycflights13)
library(nycflights23)

Data Frames

# bind the two data frames
flights <- nycflights13::flights %>% 
  rbind(nycflights23::flights)

# show size
dim(flights)
## [1] 772128     19

Time as a Dimension/Variable

Example Observations

# take 3 random samples from each year
flights |> 
  select(flight, year, month, day, dep_time, arr_time) |> 
  group_by(year) |> 
  sample_n(3)
## # A tibble: 6 × 6
## # Groups:   year [2]
##   flight  year month   day dep_time arr_time
##    <int> <int> <int> <int>    <int>    <int>
## 1   1716  2013    11     7      600      826
## 2    178  2013    10    30     1252     1356
## 3   1585  2013    12    18     1723     2008
## 4   1132  2023     3     3     1750     1949
## 5     51  2023     2     9      930     1238
## 6    756  2023    12     7     2127       35

\(\star\) Key Idea: Time variables are considered both ordinal categorical and discrete numerical, depending on how one views it on a dataframe.

Time Formats

To get the current date or date-time you can use today() or now():

today()
## [1] "2025-03-13"
now()
## [1] "2025-03-13 21:47:39 PDT"

Four ways time variables can exist:

\(\star\) Key Idea: Time formats can vary widely, so it’s important to recognize different formats and convert them into a proper date/time format when needed.

Very Ambiguous Dates

Example Dates

dates <- "
  date,time
  03/11/25,09:55:00
  11/03/25,08:33:55
  03/25/11,20:22:13
"

\(\star\) Key Idea: Sometimes dates in the data can be very ambiguous. Make sure to refer to the original source of the data and descriptions of the times variables.

Formating Date/Time

Using the read_csv function to load a csv file will automatically convert the dates into date/time data structure, but you need to specify the format.

Reading a CSV and set the dates into Month / Day / Year format

read_csv(dates, col_types = cols(date = col_date("%m/%d/%y")))
## # A tibble: 3 × 2
##   date       time    
##   <date>     <time>  
## 1 2025-03-11 09:55:00
## 2 2025-11-03 08:33:55
## 3 2011-03-25 20:22:13

Reading a CSV and set the dates into Year / Month / Day format

example_dates <- read_csv(dates,col_types=cols(date=col_date("%y/%m/%d")))
example_dates
## # A tibble: 3 × 2
##   date       time    
##   <date>     <time>  
## 1 2003-11-25 09:55:00
## 2 2011-03-25 08:33:55
## 3 NA         20:22:13

\(\star\) Key Idea: As long as the dates are valid, the formatting will work. For example, the date 03/25/11 with formatting %y/%m/%d will be invalid because there is no 25th month.

Processing Dates Written as Strings

R’s date-time specification functions are powerful but requires careful attention to the date format.

Example Dates

# year-month-day
ymd("2017-01-31")
## [1] "2017-01-31"
# month-day-year
mdy("January 31st, 2017")
## [1] "2017-01-31"
# day-month-year
dmy("31-Jan-2017")
## [1] "2017-01-31"

Converting String Dates into date/time format

Time of Day

To create a date-time, append an underscore followed by one or more of “h,” “m,” or “s” to the parsing function’s name.

ymd_hms("2025-03-11 20:11:59")
## [1] "2025-03-11 20:11:59 UTC"
mdy_hm("03/11/2025 08:01")
## [1] "2025-03-11 08:01:00 UTC"

You can also convert a date into a date-time by specifying a timezone.

ymd("2025-03-11", tz = "US/Pacific")
## [1] "2025-03-11 PDT"

Here I use the UTC timezone which you might also know as GMT, or Greenwich Mean Time, the time at 0° longitude. It doesn’t use daylight saving time, making it a bit easier to compute with .

Coordinated Universal Time (UTC)

Image Source: CIA World FactBook - World Time

Date/Time into Individual Components

example_dates |> 
  mutate(year=year(date), # year
         month=month(date), # month
         day=day(date), # day
         hour=hour(time), # hour (24-hr format)
         minute=minute(time), # minute
         second=second(time)) # second
## # A tibble: 3 × 8
##   date       time      year month   day  hour minute second
##   <date>     <time>   <dbl> <dbl> <int> <int>  <int>  <dbl>
## 1 2003-11-25 09:55:00  2003    11    25     9     55      0
## 2 2011-03-25 08:33:55  2011     3    25     8     33     55
## 3 NA         20:22:13    NA    NA    NA    20     22     13

Filtering Date/Time Periods

Convert date/time information into date/time format variable

# create subset
flights_sub <- flights |> 
  mutate(date_time = make_datetime(year,month,day,hour,minute)) |> 
  select(flight,date_time)

# view random sample
flights_sub |> 
  sample_n(3)
## # A tibble: 3 × 2
##   flight date_time          
##    <int> <dttm>             
## 1    381 2023-07-24 06:45:00
## 2    677 2023-08-27 12:29:00
## 3    806 2023-07-24 12:17:00

Use the filter()function in tidyverse

# filter flights that occurred before 2013-03-11
flights_sub |> 
  filter(date_time < ymd("2013-03-11")) |>
  sample_n(3) # view random sample
## # A tibble: 3 × 2
##   flight date_time          
##    <int> <dttm>             
## 1    371 2013-02-08 20:45:00
## 2   4566 2013-03-08 20:05:00
## 3    709 2013-03-04 08:38:00
# filter flights that occurred between May and August
flights_sub |> 
  filter(date_time >= ymd("2013-05-01") & date_time <= ymd("2013-08-01")) |>
  sample_n(3) # view random sample
## # A tibble: 3 × 2
##   flight date_time          
##    <int> <dttm>             
## 1    317 2013-06-28 09:45:00
## 2   1461 2013-05-15 16:00:00
## 3   3285 2013-05-31 18:15:00

Time Granularity and Time-Zones

Activity: Working with Date/Time Variables

  1. Log-in to Posit Cloud and open the R Studio assignment MA14: Working with Date/Time Variables.
  2. Make sure you are in the current working directory. Rename the .Rmd file by replacing [name] with your name using the format [First name][Last initial]. Then, open the .Rmd file.
  3. Change the author in the YAML header.
  4. Read the provided instructions.
  5. Answer all exercise problems on the designated sections.