Intro to Data Science

Lecture 2 – Packages, Structure, & Conditionals

A Guide to Your Process

Scheduling

Learning Objectives

Practice

Supporting Information

Class Discussion

Today’s Plan

  • R Packages
  • Class vs. Structure
  • Using Vectors
  • Conditionals

Today’s Learning Objectives

After today’s session you will be able to:

  • Load and use an R package
  • Define the difference(s) between object class and structure
  • Create and manipulate vectors
  • Write conditional statements
  • Manage missing data in objects with conditionals

R Package Background

  • R packages are groups of functions developed by users


  • Packages have no defined depth or breadth requirements
    • A package could be a single, simple function
    • Or a complex ecosystem of inter-related functions


  • Packages can be installed by any R user for free!


  • R is versatile and powerful (in part) because of contributed packages

Package Locations

  • There are two main homes for R packages


  1. Comprehensive R Archive Network


  1. GitHub

CRAN vs. GitHub

CRAN

  • Currently >20,000 packages live here


  • Strict rules for packages to be allowed


  • These are “official” packages

GitHub

CRAN vs. GitHub

CRAN

  • Currently >20,000 packages live here


  • Strict rules for packages to be allowed


  • These are “official” packages

GitHub

  • Unknown number of packages here (no centralized record retained)


  • No mandatory quality control tests to be available here


  • Packages usually work but they don’t have the same quality control as CRAN packages

Using Packages

  • In order to use a package, you must:


  1. Install the desired package
    • Done once per computer


  1. Load the package into R
    • Done every time you re-open RStudio

Using Packages: Specific Steps


# Install desired package
install.packages("dplyr")

# Load that package
library(dplyr)


  • install.packages requires the package name be in quotes


  • library requires requires unquoted names.

Package Analogy

  • install.packages = buying a set of tools from the store and putting them in your home


  • library = moving the tools that you already own to your workbench


  • You only buy the tools once but every time you start work you need to bring them back to your work area!

Practice: Packages

palmerpenguins R package hex logo

  • Make a new script for this week’s lecture!
    • Save it in your RStudio Project folder for this course
    • Make sure it has “lecture” and “2” in the file name


  • Install the palmerpenguins package
    • Remember to put quotes around the package name!


  • Load the package with the library function
    • Once loaded, run ?palmerpenguins to see the package-level help file

Note on Function Names

  • Functions are not required to have unique names across all packages


  • Risk of using a different function than intended
    • Best case: creates an error and forces you to catch the mistake
    • Worst case: silently does something wrong

Function Namespacing

  • Functions can be “namespaced” to specify which package the function comes from


  • Namespacing guarantees you use the function from the desired package


  • Done with two colons (:) between the package and function names
    • E.g., package::function()

Practice: Namespacing

palmerpenguins R package hex logo dplyr R package hex logo

  • palmerpenguins includes an example dataset on penguins
  • Run the following code
    • peng_df <- palmerpenguins::penguins


  • Install and load the dplyr package
    • dplyr has a function called glimpse that shows you core structures of data


  • Namespace glimpse and run it on peng_df

Temperature Check

How are you Feeling?

Comic-style graph depicting someone's emotional state as they debug code (from initial struggle and defeat to eventual triumph)

Class versus Structure

  • Objects have both class and structure


  • Class = the type of object it is
    • E.g., dataframe, integer, character, etc.


  • Structure = the dimensions and “shape” of the data
    • E.g., Number of rows / columns, length, etc.


  • Both class and structure affect what you can do with or to a given object!

Checking Class/Structure

  • Best to check class and structure of an object to ensure functions will work

Check Class

Check Structure

Checking Class/Structure

  • Best to check class and structure of an object to ensure functions will work

Check Class

  • Use the class function


# Checking class of 'my_obj'
class(my_obj)

Check Structure

Checking Class/Structure

  • Best to check class and structure of an object to ensure functions will work

Check Class

  • Use the class function


# Checking class of 'my_obj'
class(my_obj)

Check Structure

  • Use either the str function or dplyr::glimpse
# Checking structure of 'my_obj' (with base R)
str(my_obj)

# Checking structure of 'my_obj' (with `dplyr`)
dplyr::glimpse(my_obj)

Practice: Structure

  • What is the class of ‘peng_df’?


  • What is the structure?
    • What information is included when you check?


  • What happens when you check the class of a function?
    • Run class(class)


  • What happens when you check the structure of a function?
    • Run str(str)

Using Vectors: Coordinates

  • Vector structure is expressed as “length
    • Vector length = number of elements in the vector
    • Dataframe length = number of rows


  • Bracket notation can be used to navigate vectors


# Make a vector
my_vec <- c("a", "b", "c", "d", "e")

# Use bracket notation to retrieve one element
my_vec[3]
[1] "c"

Using Vectors: Coordinates Cont.

  • Bracket notation accepts vectors of coordinates


# Use bracket notation to retrieve several elements
my_vec[c(1, 3, 5)]
[1] "a" "c" "e"


  • You can also grab the same element more than once!


# Use bracket notation to retrieve one element multiple times
my_vec[c(1, 1, 1)]
[1] "a" "a" "a"

Practice: Vector Navigation

  • Base R has a built-in vector of letters called letters
    • Assign letters to an object called my_vec


  • Check the length of my_vec using the length function


  • Identify the 10th element of my_vec using bracket notation


  • Identify the 8th, 5th, 12th, 12th (again), and 15th elements of my_vec
    • Use concatenation inside of the brackets (with c)!

Temperature Check

How are you Feeling?

Comic-style graph depicting someone's emotional state as they debug code (from initial struggle and defeat to eventual triumph)

Conditionals

  • You can write code that runs only if an ‘if statement’ is true
    • Otherwise that chunk of code is skipped!


  • This allows you to write flexible code that can handle any outcome that you can anticipate!
    • Particularly useful for subsetting data based on the contents of a column


  • These ‘if statements’ are called conditionals


  • The answer to a conditional must be either TRUE or FALSE

Fundamentals: EQUAL

  • Are two things exactly equal?


# Check a conditional
"hello" == "hello"
[1] TRUE


  • Uses == operator
    • Just two equal signs

Fundamentals: OR

  • Are any of these conditions met?


# Check either one conditional *or* the other
"hello" == "hello" | 2 == 7
[1] TRUE


  • Uses | operator
    • Shift + \ on keyboard

Fundamentals: AND

  • Are all of the conditions met?


# Are *all* conditions TRUE?
"hello" == "hello" & 2 == 7
[1] FALSE


  • Uses & operator
    • Shift + 7 on keyboard

Fundamentals: Summary

EQUAL

  • Are two things exactly equal?


"hello" == "hello"
[1] TRUE


  • Uses == operator
    • Just two equal signs

OR

  • Are any of these conditions met?


"hello" == "hello" | 2 == 7
[1] TRUE


  • Uses | operator
    • Shift + \ on keyboard

AND

  • Are all of the conditions met?


"hello" == "hello" & 2 == 7
[1] FALSE


  • Uses & operator
    • Shift + 7 on keyboard

Practice: Fundamental Conditionals

palmerpenguins R package hex logo

  • We’ll use the base R subset function with the peng_df object
    • If needed, consult the help file for more details (?subset)


  • Subset peng_df to only Adelie penguins (and assign to a sub_v1 object)
    • I.e., species == "Adelie"


  • How many rows does that subset have?

More Practice: Fundamental Conditionals

palmerpenguins R package hex logo

  • Subset peng_df to Adelie or Gentoo penguins
    • Assign this subset to sub_v2 object


  • Subset peng_df to only male Gentoo penguins
    • Assign to sub_v3 object


  • How many rows does that subset have?

Discussion: Conditionals

  • We’ve covered EQUAL, OR, and AND
    • ==, |, or &


  • What unanswered questions do you have?


  • What other types of conditional statements would be useful?
    • Think about it in the context of wanting to subset some data

Numeric Conditionals

  • For numbers, we can use greater/less than conditionals!


  • Greater / less than are expressed as normal
    • > and <


  • Adding ‘or equal to’ is done by adding an equal sign
    • >= and <=

Practice: Numeric Conditionals

palmerpenguins R package hex logo

  • Subset peng_df to only penguins with a bill length greater than 40 mm
    • Assign to sub_v7


  • Subset peng_df to only penguins with a body mass less than or equal to 4,000 g
    • Assign to sub_v8

Temperature Check

How are you Feeling?

Comic-style graph depicting someone's emotional state as they debug code (from initial struggle and defeat to eventual triumph)

Upcoming Due Dates

Due before lab

(By midnight)

Due before lecture

(By midnight)

Bonus Conditionals

OR with >2 Options

  • OR conditionals with many options get cumbersome quickly
    • E.g., x == 1 | x == 2 | x == 3 | x == 4 …


  • We can use concatenation and the %in% operator to simplify this!


  • %in% is effectively “if any of <this vector> matches the value”
    • E.g., x %in% c(1, 2, 3, 4, …)

Conditionals: NOT

  • You can also exclude based on conditions
    • Two different ways of doing this


  • For one / a few options: use != for “not equal to”
    • E.g., x != 10


  • Can be combined with %in% to exclude a set of options
    • E.g., !x %in% c(...)
    • Note the exclamation mark is before the object

Practice: Advanced Conditionals

palmerpenguins R package hex logo

  • Subset peng_df is species is any of “Adelie”, “Gentoo”, or “Chinstrap”
    • Use the %in% operator


  • Subset peng_df to all islands except “Dream”