MTH-391A | Spring 2025 | University of Portland
January 27, 2025
Running R Commands in Different Ways
What are R packages?
Aspect | Base R | R Packages |
---|---|---|
Availability | Comes pre-installed with R | Must be installed and loaded |
Functionality | Offers basic statistical and programming tools | Provides advanced or specialized tools not included in base R |
Customization | Limited to what’s already available | Highly customizable; users can install or even create their own packages |
Performance | Base R can sometimes be slower or more verbose | Packages often include optimized or simpler syntax for complex tasks |
Speciality | Limited only for basic statistics | Often built for a specific purpose or knowledge |
tidyverse
Packagetidyverse
is a collection of packages suited for data
processing and visualization.
Core packages specifically for data processing:
dplyr
provides a grammar for data transformation.tidyr
provides a set of functions that help you get
data in consistent form.tibble
is a data frame that prioritize simplicity,
enforcing stricter checks to promote cleaner, more expressive code.How to install R Packages?
The following two methods installs the tidyverse
package.
\(\dagger\) Try the above code
sequence in your console. Then, install a different package called
plotly
.
\(\star\) Knitting an RMarkdown with this function in a code chunk will probably give you an error or warning message. You need to run it directly on the console.
How to load R Packages?
The function library()
loads any installed package. In
this case library("tidyverse")
loads the
tidyverse
package specifically.
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
\(\dagger\) Try the above code
sequence in your console. Then, load the package
plotly
.
How are variables created and stored in R?
\(\dagger\) Create a variable named
ria
and initially define it as a number. Then, redefine
ria
as a vector. What happens to the original value of
ria
(the number) after it is redefined as a vector?
\(\star\) Knowing and keeping track of what variables are defined is key to understanding why some errors occur.
What is a data frame?
Base R: These data frames are data structures that comes with base R.
Tibbles: Tibbles are special kinds of data frames
using the tibble
package.
\(\star\) A base R data frame can be
converted to a tibble data frame using the tibble()
function. The iris
data set is a built-in data in the
datasets
packages, which comes with base R.
Feature | R Data Frames | Tibbles |
---|---|---|
Printing | Full display | Abbreviated, neat display |
Subsetting | Returns vectors | Always returns tibbles |
String Handling | Strings can become factors | Strings remain characters |
Column Names | Allows partial matching | No partial matching, clearer errors |
Error Messages | Less informative | More user-friendly |
Sub-setting Columns
## num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
## Rows: 150
## Columns: 1
## $ Sepal.Width <dbl> 3.5, 3.0, 3.2, 3.1, 3.6, 3.9, 3.4, 3.4, 2.9, 3.1, 3.7, 3.4…
\(\dagger\) Try the above code examples using a different column on your console!
\(\star\) Here, the
glimpse()
function is used to simplify the output of the
subsets.
dplyr
What is dplyr
?
dplyr
is a powerful R package designed for data
processing.tidyverse
ecosystem.|>
.Why use dplyr
?
tidyverse
packages such as ggplot2
for
visualizations.dplyr
dplyr
functions that operates on
rows.
Verb | Purpose & Example |
---|---|
filter() |
Chooses rows based on conditions
filter(data, col1 > 10) |
arrange() |
Reorders rows
arrange(data, col1) |
distinct() |
Finds all the unique rows
distinct(data, col1) |
count() |
Finds all unique rows, then counts the number of
occurrences count(data, col1) |
\(\star\) Notice that the data frame
data
in the examples are always in the first argument in
the verbs. The filter()
verb uses logical operators,
which we will discuss more in detail.
dplyr
dplyr
functions that operates on
columns.
Verb | Purpose & Example |
---|---|
mutate() |
Adds or modifies columns
mutate(data, new_col = col1-col2) |
select() |
Chooses specific columns
select(data, col1, col2) |
rename() |
Renames specific columns
rename(data, 1loc = col1) |
relocate() |
Moves columns to the front
relocate(data, col1) |
\(\star\) The =
signs
in column verbs are not logical operators. Only the
filter()
verb uses the logical operators.
dplyr
Verbs Using |>
What is |>
?
|>
Define Data Frame as a Tibble
Simple Example. The following code sequence filters
the iris
data frame (in tibble form) to include only the
“setosa” species.
\(\star\) Notice that the first line
is the data frame itself, then the next line is the verb without putting
it directly into the first argument of the filter()
verb.
This is a common practice of organizing verbs in a pipeline.
|>
Advanced Example: The goal of this example is to
transform the iris
dataset by computing the ratio of
Petal.Length
to Sepal.Length
for observations
belonging to the “setosa” species.
iris_tibble |>
# rule 1: choose only the "setosa" species
filter(Species == "setosa") |>
# rule 2: pick the columns Sepal.Length and Petal.Length
select(Sepal.Length,Petal.Length) |>
# rule 3: create a new column called length_ratio
mutate(length_ratio = Petal.Length/Sepal.Length)
\(\dagger\) Try the above code
sequence in your console with the virginica
species, and
compute the ratio of Petal.Width
to
Sepal.Width
.
\(\star\) The verbs do not explicitly take the resulting data frames as the first argument because the pipe operator automatically passes the output of the previous step as the input to the next verb in the sequence.
What is a strategy and best practice on transforming a data frame?
Case example: Using the iris
dataset,
we want to determine the number of rows (or observations) for each
species where the length ratio exceeds \(0.80\) and the width ratio exceeds \(0.50\). The length ratio is defined as
Petal.Length
divided by Sepal.Length
, and the
width ratio is defined as Petal.Width
divided by
Sepal.Width
.
The original number of rows pre-transformation
## # A tibble: 3 × 2
## Species n
## <fct> <int>
## 1 setosa 50
## 2 versicolor 50
## 3 virginica 50
Post-transformation results
## # A tibble: 2 × 2
## Species n
## <fct> <int>
## 1 versicolor 2
## 2 virginica 40
\(\dagger\) The goal of the
demonstration is to replicate the shown data frames using the
dplyr
verbs.
The purpose of this activity is for you to start developing a
proficiency in transforming a data frame using dplyr
verbs.
.Rmd
file by replacing [name]
with your name
using the format [First name][Last initial]
. Then, open the
.Rmd
file.