MTH-391A | Spring 2025 | University of Portland
February 7, 2025
Load packages
Importing a (CSV) .csv
file
\(\star\) Note that you need to have
the birthdays.csv
file in your working directory.
Importing a (Excel) .xlxs
file
\(\star\) Note that you need to have
the usa-state-capitals.xlsx
file in your working
directory.
Verb | Function |
---|---|
read_tsv() |
Reads tab-delimited files |
read_delim() |
Reads in files with any delimiter, attempting to automatically guess the delimiter if you don’t specify it |
\(\star\) Note that data sets are commonly saved as CSV and Excel files but there are other filetypes (such as shapefiles) that require special functions, which we will discuss soon.
## # A tibble: 372,555 × 7
## state year month day date wday births
## <chr> <dbl> <dbl> <dbl> <date> <chr> <dbl>
## 1 AK 1969 1 1 1969-01-01 Wed 14
## 2 AL 1969 1 1 1969-01-01 Wed 174
## 3 AR 1969 1 1 1969-01-01 Wed 78
## 4 AZ 1969 1 1 1969-01-01 Wed 84
## 5 CA 1969 1 1 1969-01-01 Wed 824
## 6 CO 1969 1 1 1969-01-01 Wed 100
## 7 CT 1969 1 1 1969-01-01 Wed 90
## 8 DC 1969 1 1 1969-01-01 Wed 88
## 9 DE 1969 1 1 1969-01-01 Wed 32
## 10 FL 1969 1 1 1969-01-01 Wed 288
## # ℹ 372,545 more rows
## # A tibble: 50 × 5
## State ST Capital Longitude Latitude
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Alabama AL Montgomery -86.3 32.4
## 2 Alaska AK Juneau -134. 58.3
## 3 Arizona AZ Phoenix -112. 33.4
## 4 Arkansas AR Little Rock -92.3 34.7
## 5 California CA Sacramento -121. 38.6
## 6 Colorado CO Denver -105. 39.7
## 7 Connecticut CT Hartford -72.7 41.8
## 8 Delaware DE Dover -75.5 39.2
## 9 Florida FL Tallahassee -84.3 30.4
## 10 Georgia GA Atlanta -84.4 33.7
## # ℹ 40 more rows
\(\star\) It looks like the
state
variable in the birthdays
data frame
have categories matching the ST
variable in the
us_capital
data frame.
inner_join()
Joining birthdays
and us_capital
by
matching the state abbreviations
## # A tibble: 365,250 × 11
## state year month day date wday births State Capital Longitude
## <chr> <dbl> <dbl> <dbl> <date> <chr> <dbl> <chr> <chr> <dbl>
## 1 AK 1969 1 1 1969-01-01 Wed 14 Alaska Juneau -134.
## 2 AL 1969 1 1 1969-01-01 Wed 174 Alabama Montgo… -86.3
## 3 AR 1969 1 1 1969-01-01 Wed 78 Arkansas Little… -92.3
## 4 AZ 1969 1 1 1969-01-01 Wed 84 Arizona Phoenix -112.
## 5 CA 1969 1 1 1969-01-01 Wed 824 California Sacram… -121.
## 6 CO 1969 1 1 1969-01-01 Wed 100 Colorado Denver -105.
## 7 CT 1969 1 1 1969-01-01 Wed 90 Connecticut Hartfo… -72.7
## 8 DE 1969 1 1 1969-01-01 Wed 32 Delaware Dover -75.5
## 9 FL 1969 1 1 1969-01-01 Wed 288 Florida Tallah… -84.3
## 10 GA 1969 1 1 1969-01-01 Wed 250 Georgia Atlanta -84.4
## # ℹ 365,240 more rows
## # ℹ 1 more variable: Latitude <dbl>
\(\star\) The result contains only those rows that have matches in both tables.
left_join()
Joining us_capitals
into birthdays
by matching the state abbreviations
## # A tibble: 372,555 × 11
## state year month day date wday births State Capital Longitude
## <chr> <dbl> <dbl> <dbl> <date> <chr> <dbl> <chr> <chr> <dbl>
## 1 AK 1969 1 1 1969-01-01 Wed 14 Alaska Juneau -134.
## 2 AL 1969 1 1 1969-01-01 Wed 174 Alabama Montgo… -86.3
## 3 AR 1969 1 1 1969-01-01 Wed 78 Arkansas Little… -92.3
## 4 AZ 1969 1 1 1969-01-01 Wed 84 Arizona Phoenix -112.
## 5 CA 1969 1 1 1969-01-01 Wed 824 California Sacram… -121.
## 6 CO 1969 1 1 1969-01-01 Wed 100 Colorado Denver -105.
## 7 CT 1969 1 1 1969-01-01 Wed 90 Connecticut Hartfo… -72.7
## 8 DC 1969 1 1 1969-01-01 Wed 88 <NA> <NA> NA
## 9 DE 1969 1 1 1969-01-01 Wed 32 Delaware Dover -75.5
## 10 FL 1969 1 1 1969-01-01 Wed 288 Florida Tallah… -84.3
## # ℹ 372,545 more rows
## # ℹ 1 more variable: Latitude <dbl>
\(\star\) The rows of the first table are always returned, regardless of whether there is a match in the second table.
right_join()
Joining birthdays
into us_capitals
by matching the state abbreviations
states_rightjoined <- birthdays |>
right_join(us_capitals, by = c("state" = "ST"))
states_rightjoined
## # A tibble: 365,250 × 11
## state year month day date wday births State Capital Longitude
## <chr> <dbl> <dbl> <dbl> <date> <chr> <dbl> <chr> <chr> <dbl>
## 1 AK 1969 1 1 1969-01-01 Wed 14 Alaska Juneau -134.
## 2 AL 1969 1 1 1969-01-01 Wed 174 Alabama Montgo… -86.3
## 3 AR 1969 1 1 1969-01-01 Wed 78 Arkansas Little… -92.3
## 4 AZ 1969 1 1 1969-01-01 Wed 84 Arizona Phoenix -112.
## 5 CA 1969 1 1 1969-01-01 Wed 824 California Sacram… -121.
## 6 CO 1969 1 1 1969-01-01 Wed 100 Colorado Denver -105.
## 7 CT 1969 1 1 1969-01-01 Wed 90 Connecticut Hartfo… -72.7
## 8 DE 1969 1 1 1969-01-01 Wed 32 Delaware Dover -75.5
## 9 FL 1969 1 1 1969-01-01 Wed 288 Florida Tallah… -84.3
## 10 GA 1969 1 1 1969-01-01 Wed 250 Georgia Atlanta -84.4
## # ℹ 365,240 more rows
## # ℹ 1 more variable: Latitude <dbl>
\(\star\) Notice here the there
result is the same the using the inner_join()
because the
us_capital
only includes all 50 states while the
birthdays
include all 50 states plus DC.
.Rmd
file by replacing [name]
with your name
using the format [First name][Last initial]
. Then, open the
.Rmd
file.