MTH-391A | Spring 2025 | University of Portland
March 12, 2025
Temporal data refers to data that is associated with time, meaning it captures changes, trends, or patterns over a period.
Key Characteristics:
New York City Flights
nycflights13
package: On-time data for all flights
that departed NYC (i.e. JFK, LGA or EWR) in 2013nycflights23
package: Similar to
nycflights13
but in 2023Example Observations
# take 3 random samples from each year
flights |>
select(flight, year, month, day, dep_time, arr_time) |>
group_by(year) |>
sample_n(3)
## # A tibble: 6 × 6
## # Groups: year [2]
## flight year month day dep_time arr_time
## <int> <int> <int> <int> <int> <int>
## 1 1716 2013 11 7 600 826
## 2 178 2013 10 30 1252 1356
## 3 1585 2013 12 18 1723 2008
## 4 1132 2023 3 3 1750 1949
## 5 51 2023 2 9 930 1238
## 6 756 2023 12 7 2127 35
\(\star\) Key Idea: Time variables are considered both ordinal categorical and discrete numerical, depending on how one views it on a dataframe.
To get the current date or date-time you can use today()
or now()
:
## [1] "2025-03-13"
## [1] "2025-03-13 21:47:39 PDT"
Four ways time variables can exist:
tidyverse
.\(\star\) Key Idea: Time formats can vary widely, so it’s important to recognize different formats and convert them into a proper date/time format when needed.
In which format below are the given date/time written?
Day / Month / Year
Day / Year / Month
Month / Day / Year
Month / Year / Day
Year / Day / Month
Year / Month / Day
What time format is it written?
\(\star\) Key Idea: Sometimes dates in the data can be very ambiguous. Make sure to refer to the original source of the data and descriptions of the times variables.
Using the read_csv
function to load a csv
file will automatically convert the dates into date/time data structure,
but you need to specify the format.
Reading a CSV and set the dates into
Month / Day / Year
format
## # A tibble: 3 × 2
## date time
## <date> <time>
## 1 2025-03-11 09:55:00
## 2 2025-11-03 08:33:55
## 3 2011-03-25 20:22:13
Reading a CSV and set the dates into
Year / Month / Day
format
## # A tibble: 3 × 2
## date time
## <date> <time>
## 1 2003-11-25 09:55:00
## 2 2011-03-25 08:33:55
## 3 NA 20:22:13
\(\star\) Key Idea:
As long as the dates are valid, the formatting will work. For example,
the date 03/25/11
with formatting %y/%m/%d
will be invalid because there is no 25th month.
R’s date-time specification functions are powerful but requires careful attention to the date format.
Example Dates
## [1] "2017-01-31"
## [1] "2017-01-31"
## [1] "2017-01-31"
Converting String Dates into date/time format
To create a date-time, append an underscore followed by one or more of “h,” “m,” or “s” to the parsing function’s name.
## [1] "2025-03-11 20:11:59 UTC"
## [1] "2025-03-11 08:01:00 UTC"
You can also convert a date into a date-time by specifying a timezone.
## [1] "2025-03-11 PDT"
Here I use the UTC timezone which you might also know as GMT, or Greenwich Mean Time, the time at 0° longitude. It doesn’t use daylight saving time, making it a bit easier to compute with .
example_dates |>
mutate(year=year(date), # year
month=month(date), # month
day=day(date), # day
hour=hour(time), # hour (24-hr format)
minute=minute(time), # minute
second=second(time)) # second
## # A tibble: 3 × 8
## date time year month day hour minute second
## <date> <time> <dbl> <dbl> <int> <int> <int> <dbl>
## 1 2003-11-25 09:55:00 2003 11 25 9 55 0
## 2 2011-03-25 08:33:55 2011 3 25 8 33 55
## 3 NA 20:22:13 NA NA NA 20 22 13
Convert date/time information into date/time format variable
# create subset
flights_sub <- flights |>
mutate(date_time = make_datetime(year,month,day,hour,minute)) |>
select(flight,date_time)
# view random sample
flights_sub |>
sample_n(3)
## # A tibble: 3 × 2
## flight date_time
## <int> <dttm>
## 1 381 2023-07-24 06:45:00
## 2 677 2023-08-27 12:29:00
## 3 806 2023-07-24 12:17:00
Use the filter()
function in
tidyverse
# filter flights that occurred before 2013-03-11
flights_sub |>
filter(date_time < ymd("2013-03-11")) |>
sample_n(3) # view random sample
## # A tibble: 3 × 2
## flight date_time
## <int> <dttm>
## 1 371 2013-02-08 20:45:00
## 2 4566 2013-03-08 20:05:00
## 3 709 2013-03-04 08:38:00
# filter flights that occurred between May and August
flights_sub |>
filter(date_time >= ymd("2013-05-01") & date_time <= ymd("2013-08-01")) |>
sample_n(3) # view random sample
## # A tibble: 3 × 2
## flight date_time
## <int> <dttm>
## 1 317 2013-06-28 09:45:00
## 2 1461 2013-05-15 16:00:00
## 3 3285 2013-05-31 18:15:00
.Rmd
file by replacing [name]
with your name
using the format [First name][Last initial]
. Then, open the
.Rmd
file.