Data Visualization Basics

Nick J Lyon

A Guide to Your Process

Scheduling

Learning Objectives

Practice

Supporting Information

Class Discussion

Today’s Plan

  • Types of Data
  • Aside: Tidy Data
  • Anatomy of a Graph
  • Visualization Goals
  • Graph Types

Today’s Learning Objectives

After today’s session you will be able to:

  • Define two major types of data
  • Identify characteristics of “tidy” data
  • Define fundamental anatomy of a graph
  • Explain how visualization is affected by what message you want to convey
  • Paraphrase how you can choose what type of graph to use

Graphing Overview

  • Before you can get into graphing, you need to know:


  1. What type(s) of data you have


  1. What message you want to share with your audience

Types of Data

Continuous

  • Data must be numbers


  • Infinite options within possible range


  • For example: height, length, profit

Categorical

  • Data may be numbers


  • Options limited to particular intervals


  • For example: counts, satisfaction ratings

Types of Data - Comic

Comic with a bird on the left under the 'continuous' heading and some non-integer number examples and an octopus on the right under the 'discrete' heading and some integer examples

Data Format Aside

  • In order to make a graph, your data need to be “tidy”


  • But what does it mean for data to be “tidy”?


  • Let’s dig into that

What is ‘Tidy Data’?

  • One row = one observation


  • One column = one variable


  • One cell = one data point

Cartoon data table with one column labeled as 'variable', one row as 'observation', and two cells as 'data point'

Tidy vs. Un-Tidy Comic

Cartoon of several brightly-colored, anthropomorphic tables with a speech bubble saying 'we are all tidy' above several other tables in dull colors and various, irregular shapes with a speech bubble saying 'we are untidy in different ways'

Un-Tidy Example 1

Screen capture of an untidy data table in MS Excel where several sub-tables are included in different places on the same sheet
  • Is every column a variable? No!
  • Is every row an observation? No!

Un-Tidy Example 2



  • Is every column a variable? Yep
  • Is every row an observation? No!

Fixing Un-Tidy Data

  • What if you realize your data are not tidy?


  • Ideally, you’d use some sort of code language (e.g., R, Python) to fix the data
    • This is called wrangling


  • If you don’t speak code, carefully copy/pasting things is okay
    • I strongly recommend making a copy that you don’t touch before doing this!
    • That way you can check your work if you make a mistake

Anatomy of a Graph - P1

Scatterplot showing a negative relationship between the Y and X axes

Anatomy of a Graph - P2

Same scatterplot as in prior image but with a box around the vertical axis (labeled 'y axis/response variable'), another around the horizontal axes (labeled 'x axis/explanatory variable'), and one final one around the plot area (labeled 'plot area')

Anatomy of a Graph - P3

Same scatterplot as in prior image but with a box around the axis labels, another around the axis tick labels, and a final category of box around the tick marks themselves

Choosing the Right Graph

  • There are a lot of different types of graphs you can make


  • As you work more with data, you will hone your intuition for which is correct for a given context


  • For now, let’s consider a simplified ‘roadmap’ to help you as you start your data visualization journey!

Graph Choice Roadmap


Visualization roadmap with a series of yes/no branching lines between colored boxes. The top box says 'is my explanatory variable categorical?'

Graph Choice Roadmap


Visualization roadmap with a series of yes/no branching lines between colored boxes. The top box says 'is my explanatory variable categorical?'. If 'yes' another box asks 'is the range of the data what's important?'

Graph Choice Roadmap


Visualization roadmap with a series of yes/no branching lines between colored boxes. The top box says 'is my explanatory variable categorical?'. If 'yes' another box asks 'is the range of the data what's important?'. If 'yes', then boxplot/violin plot, if 'no' then bar graph.

Graph Choice Roadmap


Visualization roadmap with a series of yes/no branching lines between colored boxes. The top box says 'is my explanatory variable categorical?'. If 'yes' another box asks 'is the range of the data what's important?'. If 'yes', then boxplot/violin plot, if 'no' then bar graph. If the explanatory variable is not categorical then another box asks 'do I care about the trend?'

Graph Choice Roadmap


Visualization roadmap with a series of yes/no branching lines between colored boxes. The top box says 'is my explanatory variable categorical?'. If 'yes' another box asks 'is the range of the data what's important?'. If 'yes', then boxplot/violin plot, if 'no' then bar graph. If the explanatory variable is not categorical then another box asks 'do I care about the trend?'. If 'yes' then scatterplot plus best fit line, if 'no' then just a scatterplot

Aside: Categorical Response Variable

  • You may notice that the prior slide excluded the possibility of a categorical response


  • If both your explanatory and response are categorical, you likely will want a table instead of a graph


  • Or something that is technically a scatterplot but has relatively few points

Graph Explanation: Scatterplot

Scatterplot Example

Graph Explanation: Violin Plot

Violin Plot Example

Violin Plot vs. Boxplot

  • Why use one versus the other?

Graph Explanation: Bar Graph

Bar Graph Example

Pop-Quiz: Graph Choices!

  • Let’s run through some examples!


  • Raise your hand if you think you know the proper graph type to use in each of the following examples

Thanks! Questions?