Week 4 Slide Decks

Overview

This week we’ll focus first on data joining via the four types of join. During lab, we’ll move on to begin our exploration of statistics in R. We’ll begin with thinking about the larger context for why we (need to) do statistics and then go into a few of the most common tests and how they can be run in R.

Lecture 4 - Joining Data

Lecture Slides – Full Screen

Lab 4 - Statistics

Lab Slides – Full Screen

Homework 4

Learning Objective(s)

Upon completion of this assignment, students will be able to:

  • Judge the appropriate join to use depending on the question being asked
  • Distinguish between continuous and discrete data
  • Identify the correct statistical test given response and explanatory variable types
  • Perform analysis in R correctly

Assignment Due Date(s)

Each homework is due at midnight the day before each lecture (i.e., Monday night) Late work will be accepted but will be subject to the late assignments policy outlined in this course’s syllabus.

Assignment Description

This homework should be submitted as an R Markdown with your last name and the week number as the file name (e.g., “Lyon_week2.Rmd”). Remember to specifically load any necessary packages using the library function and include comments explaining what line(s) correspond to each of the following prompts.

  1. A friend asks for your help wrangling some data for a summer project they did on tomatoes. Your friend tells you the “tomato.csv” file includes data on the number of buds, flowers, and fruits from 10 tomato plants they measured in a greenhouse. These plants either received no treatment or had Nitrogen added to their soil. Treatment information is listed in “tomato_treatment.csv”. Finally, some plants had some issues that your friend recorded in a third file just in case they needed to account for that in the statistics later on. This question has multiple parts, be sure to answer all components! Each sub-question should be in its own code chunk.
    • Load the dplyr package and read in the three data files they shared with you: (1) “tomato.csv”, “tomato_treatment.csv”, and “tomato_issues.csv”. Check the structure of all three files.
    • Your friend first wants you to attach the treatment information to the main data file. Do this with a join (by plant) so that no rows are lost and check the structure of the resulting object.
    • Your friend has decided they do want to drop some of the plants that had issues after all. They want to remove the problem plants if they had either herbivore damage or a fungal infection (they are okay including over-watered plants in the final data). Using filter and an appropriate join of your choice, remove the plants your friend wants excluded from the data object you created in the previous part of this question. Check the structure of this object.
  2. Let’s revisit the penguin data from the palmerpenguins package to practice some statistics! This question has multiple parts, be sure to answer all components! Each sub-question should be in its own code chunk.
    • Load the palmerpenguins package and check the structure of the penguins data object.
    • For all columns, identify whether each column is “discrete” (aka “categorical”) or “continuous”.
    • Now that you’ve identified the data type of each column, let’s do some statistics! Your hypothesis is that flipper length differs between male and female penguins. Fit the correct test and run summary on the model fit object. Was your hypothesis supported? How do you know? (Remember you can check the “roadmap” we covered in class!)
    • You hypothesize that penguins with a higher body mass have longer flippers. Fit the correct test and run summary on the model fit object. Was your hypothesis supported? How do you know?