Intro to Data Science

Lecture 1 – Data Science Fundamentals

A Guide to Your Process

Scheduling

Learning Objectives

Practice

Supporting Information

Class Discussion

Today’s Plan

  • Introductions
  • Data Science Background
  • Why R?
  • Problem Solving Tips (for Coding)
  • Computer File Paths

Today’s Learning Objectives

After today’s session you will be able to:

  • Define “data science”
  • Explain why the course is taught in R
  • Identify useful code problem solving techniques
  • Demonstrate comprehension of computer file paths

About Me

Photo of course instructor. A caucasian, male-presenting person with glasses in a dark jacket

  • Nick or Professor Lyon (they / them)


  • Education:
    • B.Sc. Biology + M.Sc. Ecology & Evolutionary Biology


  • Career Goals:
    • Marine Biologist Arrow Right College Faculty Arrow Right ??? Arrow Right Data Scientist


  • Hobbies:
    • Dungeons & Dragons; Reading; Hiking; Movies; Videogames

Introductions

Tell me a bit about yourselves!

  • What is your preferred name?

  • What year are you in school?

  • What’s one of your hobbies that brings you joy?

Introductions (Continued)

  • Why did you sign up for the course?

  • What skill(s) are you most excited to learn?

  • What previous coding / data science experience do you have?

    • Absolutely fine if this is your first foray into data science!

My Goal for You

Comic-style graph depicting someone's confidence with R changing over time

Data Science Definition



Data science combines programming and statistics with subject matter expertise to identify patterns and insights hidden in data

Why is this Course Taught in R?

R is a programming language that is awesome for environmental data scientists


Benefits of R:


Free Reproducible Accessible Popular Versatile
Piggy Bank Repeat Universal Access Star Masks Theater

R’s Popularity

R Data Science Value

Arrows Left Right Reproducibility

  • R is written in “scripts
    • Scripts = step-by-step instructions
  • Scripts can be run by any R user
  • Allows perfect replication of process
  • Programs that require clicking buttons are not (as) reproducible
    • Would depend on accompanying written/verbal instructions

Collaboration Handshake

  • R scripts can be co-developed!
  • They can then be shared like a paper draft
  • Many tools exist to formalize sharing
    • We’ll cover one in this course!
  • Unscripted programs would again require written/verbal instructions
    • Then hoping someone clicks the right buttons in the right way

Temperature Check

How are you Feeling?

Comic-style graph depicting someone's emotional state as they debug code (from initial struggle and defeat to eventual triumph)

Problem Solving in R

Small monster blaming a code creature for slipping while in the background other creatures labeled file management, typing, and computer navigation laugh

Problem Solving Methods

  • Problem solving is an important life skill generally
    • Also useful for data science!


  • I do not recommend using AI as a problem-solving method


  • Let’s discuss some useful strategies:
    1. ‘Rubber duck’ method
    2. Google (seriously!)
    3. Teamwork
    4. Take a break
    5. Whatever methods you use!

“AI” Aside

  • I strongly discourage the use of AI tools in this class
    • E.g., ChatGPT, GitHub CoPilot, etc.


  • Two primary reasons:
    1. It undermines your learning
    2. There is an ethical dimension we don’t have time to cover in this class


  • However, I’m an educator not a cop
    • I won’t be policing you in order to enforce my take on AI

Method 1: Rubber Duck

Image of a rubber duck - Get rubber duck / small object


  • Explain each line of your code to the duck
    • Go into as much detail as possible
    • Re-read lines carefully as you explain


  • You’ll catch typos/errors that you had missed!


  • Why “rubber duck” instead of “friend”?
    • Because it would likely be a dull experience for your friend Face Smile

Method 2: Google

Google logo

  • This is a serious suggestion!
    • Google is truly an amazing resource for this



  • If you get an error:
    1. Copy the entire error message
    2. Paste it into Google
    3. Check the first few links to see how others solved that issue

Google Tips

Google logo

  • Ignore the “AI Overview”
    • This is frequently wrong and/or misleading
    • Click actual links!


  • Use a plus sign (+) between search terms
    • E.g., “R + <error message text>”


  • When specific wording matters, use quotes!
    • E.g., “I want results with exactly this phrase”

Method 3: Team Up!

  • Group work is a classic method of problem solving


  • Email/text classmates about errors you’re encountering
    • Set up a weekly time to meet and work together


  • Group work & assignments
    • I really encourage you to work together to solve problems
    • BUT assignments should be produced by you alone
    • There are no group assignments in this course!

Method 4: Take a Break

Creature saying they need a minute while the R logo with eyes looks worried in a cloud of error messages

Method 4: Take a Break

  • Coding issues can be super frustrating
    • Totally normal to feel this way


  • If you are struggling to solve a problem, take a few minutes to step re-set
    • Physically step away and do something active
    • Do one of your hobbies for a few minutes
    • Work on something else


  • Return to the problem an hour or so later and try again!

Method 5: Yours!

  • As students, you’re experienced problem solvers already!


  • Code problems can likely be solved by the strategies you already use!


  • How do you solve problems you encounter in other courses or at work?

Temperature Check

How are you Feeling?

Comic-style graph depicting someone's emotional state as they debug code (from initial struggle and defeat to eventual triumph)

Computer File Paths

  • Computers store files in “folders
    • Folders can be nested inside other folders


  • The name of all folders leading to a particular file is that file’s “file path
    • File path starts at the biggest folder (“top” folder) and ends at the file
    • Each folder name is separated by slashes (\ or /)


  • For example: ~ / Downloads / BIO-316_syllabus.docx

File Path Example

What is the file path for the notes document in this image?

Diagram of hierarchically nested folders with a 'downloads' folder and 'documents' folder at the top where 'documents' contains a 'bio 101' and 'bio 316' folder and the 'bio 316' folder contains a notes document

~ / Documents / BIO 316 / BIO316-Notes-Week 1.docx

Practice: File Paths

  • Pick a file on your computer
    • Not one in the “Downloads” folder (file paths are too short)


  • What is that file’s path?
    • Hint: ~ ///


  • When you have it, show me the file and tell me its path

Upcoming Due Dates

Due before lab

Due ASAP

  • Install R (see here)
  • Install RStudio (see here)
  • Read the syllabus (esp. point values + assignment descriptions)

Due by midnight

  • Muddiest Point #1

Due before lecture

(By midnight)