Intro to Data Science

Lecture 2 – Packages & Structure

A Guide to Your Process

Scheduling

Learning Objectives

Practice

Supporting Information

Class Discussion

Today’s Plan

  • R Packages
  • Class vs. Structure
  • Using Vectors

Today’s Learning Objectives

After today’s session you will be able to:

  • Load and use an R package
  • Define the difference(s) between object class and structure
  • Create and manipulate vectors

R Package Background

  • R packages are groups of functions developed by users


  • Packages have no defined depth or breadth requirements
    • A package could be a single, simple function
    • Or a complex ecosystem of inter-related functions


  • Packages can be installed by any R user for free!


  • R is versatile and powerful (in part) because of contributed packages

Package Locations

  • There are two main homes for R packages


  1. Comprehensive R Archive Network


  1. GitHub

CRAN vs. GitHub

CRAN

  • Currently >20,000 packages live here


  • Strict rules for packages to be allowed


  • These are “official” packages

GitHub

CRAN vs. GitHub

CRAN

  • Currently >20,000 packages live here


  • Strict rules for packages to be allowed


  • These are “official” packages

GitHub

  • Unknown number of packages here (no centralized record retained)


  • No mandatory quality control tests to be available here


  • Packages usually work but they don’t have the same quality control as CRAN packages

Using Packages

  • In order to use a package, you must:


  1. Install the desired package
    • Done once per computer


  1. Load the package into R
    • Done every time you re-open RStudio

Using Packages: Specific Steps


# Install desired package
install.packages("dplyr")

# Load that package
library(dplyr)


  • install.packages requires the package name be in quotes


  • library requires requires unquoted names.

Package Analogy

  • install.packages = buying a set of tools from the store and putting them in your home


  • library = moving the tools that you already own to your workbench


  • You only buy the tools once but every time you start work you need to bring them back to your work area!

Practice: Packages

palmerpenguins R package hex logo

  • Make a new script for this week’s lecture!
    • Save it in your RStudio Project folder for this course
    • Make sure it has “lecture” and “2” in the file name


  • Install the palmerpenguins package
    • Remember to put quotes around the package name!


  • Load the package with the library function
    • Once loaded, run ?palmerpenguins to see the package-level help file

Note on Function Names

  • Functions are not required to have unique names across all packages


  • Risk of using a different function than intended
    • Best case: creates an error and forces you to catch the mistake
    • Worst case: silently does something wrong

Function Namespacing

  • Functions can be “namespaced” to specify which package the function comes from


  • Namespacing guarantees you use the function from the desired package


  • Done with two colons (:) between the package and function names
    • E.g., package::function()

Practice: Namespacing

palmerpenguins R package hex logo dplyr R package hex logo

  • palmerpenguins includes an example dataset on penguins
  • Run the following code
    • peng_df <- palmerpenguins::penguins


  • Install and load the dplyr package
    • dplyr has a function called glimpse that shows you core structures of data


  • Namespace glimpse and run it on peng_df

Temperature Check

How are you Feeling?

Comic-style graph depicting someone's emotional state as they debug code (from initial struggle and defeat to eventual triumph)

Class versus Structure

  • Objects have both class and structure


  • Class = the type of object it is
    • E.g., dataframe, integer, character, etc.


  • Structure = the dimensions and “shape” of the data
    • E.g., Number of rows / columns, length, etc.


  • Both class and structure affect what you can do with or to a given object!

Checking Class/Structure

  • Best to check class and structure of an object to ensure functions will work

Check Class

Check Structure

Checking Class/Structure

  • Best to check class and structure of an object to ensure functions will work

Check Class

  • Use the class function


# Checking class of 'my_obj'
class(my_obj)

Check Structure

Checking Class/Structure

  • Best to check class and structure of an object to ensure functions will work

Check Class

  • Use the class function


# Checking class of 'my_obj'
class(my_obj)

Check Structure

  • Use either the str function or dplyr::glimpse
# Checking structure of 'my_obj' (with base R)
str(my_obj)

# Checking structure of 'my_obj' (with `dplyr`)
dplyr::glimpse(my_obj)

Practice: Structure

  • What is the class of ‘peng_df’?


  • What is the structure?
    • What information is included when you check?


  • What happens when you check the class of a function?
    • Run class(class)


  • What happens when you check the structure of a function?
    • Run str(str)

Using Vectors: Coordinates

  • Vector structure is expressed as “length
    • Vector length = number of elements in the vector
    • Dataframe length = number of rows


  • Bracket notation can be used to navigate vectors


# Make a vector
my_vec <- c("a", "b", "c", "d", "e")

# Use bracket notation to retrieve one element
my_vec[3]
[1] "c"

Using Vectors: Coordinates Cont.

  • Bracket notation accepts vectors of coordinates


# Use bracket notation to retrieve several elements
my_vec[c(1, 3, 5)]
[1] "a" "c" "e"


  • You can also grab the same element more than once!


# Use bracket notation to retrieve one element multiple times
my_vec[c(1, 1, 1)]
[1] "a" "a" "a"

Practice: Vector Navigation

  • Base R has a built-in vector of letters called letters
    • Assign letters to an object called my_vec


  • Check the length of my_vec using the length function


  • Identify the 10th element of my_vec using bracket notation


  • Identify the 8th, 5th, 12th, 12th (again), and 15th elements of my_vec
    • Use concatenation inside of the brackets (with c)!

Temperature Check

How are you Feeling?

Comic-style graph depicting someone's emotional state as they debug code (from initial struggle and defeat to eventual triumph)

Upcoming Due Dates

Due before lab

(By midnight)

Due before lecture

(By midnight)