Week 3 Slide Decks

Overview

This week we’ll cover the fundamentals of an R Markdown file (as opposed to the R scripts we’ve been covering). We’ll also practice loading external data files into R and then exploring them with both base R tools and functions included in the dplyr package. In lab, we’ll expand our data wrangling conversation to include the pipe operator and how group-wise summarization can be accomplished in R. We’ll finish up with data “shapes” and how data can be re-shaped between wide and long format.

Lecture 3 - R Markdowns & Data Wrangling (P1)

Lecture Slides – Full Screen

Lab 3 - Data Wrangling (P2)

Lab Slides – Full Screen

Homework 3

Learning Objective(s)

Upon completion of this assignment, students will be able to:

  • Explain fundamental aspects of R Markdown files
  • Use data wrangling functions from the dplyr and tidyr packages
  • Rearrange wrangling code using the pipe operator (%>%)

Assignment Due Date(s)

Each homework is due at midnight the day before each lecture (i.e., Monday night) Late work will be accepted but will be subject to the late assignments policy outlined in this course’s syllabus.

Assignment Description

This homework should be submitted as an R markdown with your last name and the week number as the file name (e.g., “Lyon_week3.Rmd”). Remember to specifically load any necessary packages using the library function and include comments explaining what line(s) correspond to each of the following prompts.

  1. In your own words, explain the purpose of the YAML of an R Markdown file. What does it do / what is it for?

  2. Imagine you’re working on an R Markdown file. You set one code chunk to include = FALSE. What will happen to the contents of that code chunk when you knit the file? What parts–if any–of the code chunk will be displayed in the knit file?

  3. You are studying native bee populations and need to wrangle your dataset to be ready for someone else to take on analysis. This question has multiple parts, be sure to answer all components! Each sub-question should be in its own code chunk.

    • Load the dplyr and magrittr packages and read in the example bee data (“bees.csv”) into R and check its structure. How many rows / columns are there?
    • Your advisor reminds you that 2021 was a tough year for field research and those data are likely not reliable. Use the filter function to remove all data on bees identified in 2021.
    • You also realize that your methods are not well-suited to capturing kleptoparasitic bee abundance so those values are not reliable enough to pass on to analysis. Use the select function to remove this column from the subset you just created.
    • You feel that the data you have now are clean enough to continue. However, your collaborator wants a column for the total bee abundance in each year. Use the mutate function to create the column your collaborator has requested from the data version created by the above step.
    • Your collaborator loves the final data product you created! But they looked at your code and they think it can be streamlined. Copy the code you just wrote (filter, select, and mutate) and use the pipe operator (%>%) to write a version that does all three steps without creating intermediary objects. Check the structure of the final object.
  4. Your data wrangling skills are so impressive to your collaborator that they ask for your help with a lichen project they have been working on. This question has multiple parts, be sure to answer all components! Each sub-question should be in its own code chunk.

    • Load the tidyr package and read in the example lichen data (“tree_lichen.csv”) into R and check its structure? What columns are included?
    • Your collaborator tells you that they have already cleaned the data but the data shape isn’t exactly what they need. They want your help reshaping the data into long format. Use the pivot_longer function to reshape their data so that they have the following three columns: “tree” unchanged from the wide format version; “lichen_type” which includes whether the lichen is crustose, foliose, or fruticose; “percent_cover” which includes the percent cover values your collaborator measured in the field.
    • Check the structure of the long format object you created. How many rows are there?