R Code Tutorials

The below tutorials are all focused on neat features of the R programming language. See sub-headings for more specifics!

Class Coercion

The R programming language is extremely useful for a variety of data science tasks. It–and other object-oriented programming languages–allow storing values in “objects” and then using those objects to re-call and use the values to which they are bound. In order to combine different “types” of values, R has to “coerce” one or both of the values into a shared type (sometimes a.k.a. “class” depending on what you’re working on).

Coercion Rules

The order of this coercion is logical integer double character. Logicals are the most specific type of atomic vector and the order proceeds to characters which are the most general type. It might be helpful to consider some examples. Let’s begin by making one object for each atomic type (identified above).

my_logi <- c(TRUE, FALSE)
typeof(x = my_logi)

[1] "logical"

my_int <- c(4L, 5L, 6L)
typeof(x = my_int)

1: The L at the end is for “length” and ensures that number is type

[1] "integer"

my_doub <- c(3.14, 77.0, 20)
typeof(x = my_doub)

2: Even though “20” looks like an integer, R will consider it a double without the trailing L

[1] "double"

my_char <- c("a", "b", "c")
typeof(x = my_char)

[1] "character"

Now that we have those, let’s combine them in sequence so we can see the coercion rules in action!

my_logi.int <- c(my_logi, my_int)
my_logi.int

[1] 1 0 4 5 6

typeof(x = my_logi.int)

[1] "integer"

my_int.doub <- c(my_int, my_doub)
my_int.doub

[1]  4.00  5.00  6.00  3.14 77.00 20.00

typeof(x = my_int.doub)

[1] "double"

my_doub.char <- c(my_doub, my_char)
my_doub.char

[1] "3.14" "77"   "20"   "a"    "b"    "c"

typeof(x = my_doub.char)

[1] "character"

Other Coercion Variants

You may have noticed that the above examples are missing some classes of object with which you may work regularly. These were absent from the above examples because the first component of this tip is restricted only to “type” coercion while you may be thinking of “class” coercion. See below for some examples that may address what felt missing above.

“Numeric” values are technically inclusive of both integers and doubles. The reason to avoid that phrasing earlier was just to be more precise about the coercion rules between integers and doubles.

Factors are a special case of an integer. This does mean that coercing a factor can have surprising results in some cases.

# Make our character object into a factor
my_fact <- as.factor(x = my_char)
my_fact

[1] a b c
Levels: a b c

# Check the type & class
typeof(x = my_fact)

[1] "integer"

class(x = my_fact)

[1] "factor"

# Coerce it by combining with a double
my_doub.fact <- c(my_doub, my_fact)
my_doub.fact
typeof(x = my_doub.fact)

3: Because doubles “win” coercion against integers, our a, b, and c become 1.00, 2.00, and 3.00 respectively!

[1]  3.14 77.00 20.00  1.00  2.00  3.00
[1] "double"

Dates are a special case of a double. They represent the number of days since January 1^st, 1970. Like factors, this means that coercion can behave in a way that surprises you.

# Make a date
my_date <- as.Date(x = "2024-10-13")

# Check the type & class
typeof(x = my_date)

[1] "double"

class(x = my_date)

[1] "Date"

# Coerce it by combining with a character
my_char.date <- c(my_char, my_date)
my_char.date

[1] "a"     "b"     "c"     "20009"

typeof(x = my_char.date)

[1] "character"

Date-times are a special case of a double. They represent the number of seconds since January 1^st, 1970. Just like dates, this can make coercion surprising here as well.

# Make a datetime
my_datetime <- as.POSIXct("2024-10-13 23:00", tz = "UTC")

# Check the type & class
typeof(x = my_datetime)

[1] "double"

class(x = my_datetime)

[1] "POSIXct" "POSIXt"

# Coerce it by combining with a character
my_char.datetime <- c(my_char, my_datetime)
my_char.datetime

[1] "a"          "b"          "c"          "1728860400"

typeof(x = my_char.datetime)

[1] "character"

Signaling Conditions

Many programming languages rely on being able to signal “conditions” when code doesn’t work as intended. These conditions range from effectively ‘for your information’ notes all the way to full-blown errors. See the tabs below for the three most common conditions in R

Messages indicate that an action has been taken on the user’s behalf but not necessarily a problem. These can be useful to explicitly inform a user about an assumed default value or–for code that iterates for a long time–reassure users that the function is still working.

message("Pssst")

Pssst

Warnings indicate that something has gone wrong but the function could at least partially recover. These can be useful when some facet of a user’s input is incorrect but the code can still complete. I often uses warnings in my custom functions that have at least one argument that expects a logical (i.e., T or F). If the user supplies anything other than a logical, I return a warning and coerce that argument to whatever default logical I originally defined.

warning("Oops")

Warning: Oops

Errors indicate that that the function cannot continue and execution must stop. Including custom input checks with informative messages in an important facet of package development! And in non-function code, error messages are your first indicating that something is not working as it should.

stop("Oh no")
#> Error: Oh no

1: I included what this looks like as a commented-out line because otherwise the website can’t render this page.

Looping Across Integers

When iterating a given operation it is common to loop across some integer. For example, maybe you’re looping across a list and want to use the numeric position of each element of the list. Typically, this is accomplished like so:

# Define vector
my_vec <- c("a", "b", "c")

# Loop across it
for(k in 1:length(my_vec)){
  
  # Print the kth letter
  cat("Processing ", my_vec[[k]], "\n", sep="")
  
}

Processing a
Processing b
Processing c

This works in this case but if the vector of values has no elements, the loop will behave unexpectedly. This is because 1:length of an empty vector returns 1 and 0! Let’s demonstrate this here:

# Make an empty vector
empty_vec <- c()

# Loop across it
for(k in 1:length(empty_vec)){
  
  # Print the kth letter
  cat("Processing ", empty_vec[[k]], "\n", sep="")
  
}

Processing 
Processing

# Demonstrate how the loop interpreted the `1:length` bit
1:length(empty_vec)

[1] 1 0

See how the loop still appears to work but isn’t returning values that might be expected? This can be especially challenging to debug with a more complex (i.e., more realistic) loop. However, we can reformat the first part of the loop to use seq_along instead of 1:length. The loop will still not work but it will be more clear that the issue is with your initial vector of inputs.

# Loop across the empty vector
for(k in seq_along(empty_vec)){
  
  # Print the kth letter
  cat("Processing ", empty_vec[[k]], "\n", sep="")
  
}

# Demonstrate how the loop interpreted the `seq_along` bit
seq_along(empty_vec)

1: Technically, seq_along has an along.with argument but for conciseness I’ve let it be implicit in this demo

integer(0)

Selecting Elements

`[`, `[[`, and `$`

In R, there are three primary methods of selecting elements in an object – [, [[, and $. However, many R users don’t actually know how the three methods differ from one another. The following attempts to clarify this! Let’s start with a multi-element list and then check out an example of each.

# Make a 3-element list
my_list <- list("a" = 1:3, "b" = "hello", "c" = 7:9)

If x is a train with multiple cars where each car may contain some number of items, x[1] grabs the whole first train car. This means that the extracted bit is still the same type of data as the original object; in this case that means we still have a list, just this time it has only a single element.

# Select with position
my_list[1]

$a
[1] 1 2 3

Using either element position or element name (if there is one) is supported.

# Select with name
my_list["c"]

$c
[1] 7 8 9

If x is a train with multiple cars where each car may contain some number of items, x[[1]] grabs the contents of the whole first train car. This means that the type of data changes to whatever is stored in that element. In this case that means we now have a vector.

my_list[[1]]

[1] 1 2 3

Again, both element position and element name (if there is one) are supported.

my_list[["c"]]

[1] 7 8 9

If x is a train with multiple cars where each car may contain some number of items, x$name also grabs the contents of the whole first train car. x$name is shorthand for x[["name"]]! However, only the element name is supported when using this method for selecting an element.

my_list$a

[1] 1 2 3

`[[` versus `$`

The above examples show how [[ and $ function similarly but there is an important caveat to this! If name is an object containing one of the names in x, then the two methods differ. x[[name]] will get the entity that matches the value of name while x$name will get an entity that is itself named name. See an example below:

# Make a new list
my_list2 <- list("d" = 4, "e" = 5, "f" = 6)

# Make an object containing the name we want
wanted_bit <- "e"

# Select it with double brackets
my_list2[[wanted_bit]]

1: wanted_bit is interpreted as "e" because that is the value bound to that object.

[1] 5

# Select it with a dollar sign
my_list2$wanted_bit

2: This returns NULL because "wanted_bit" is not the name of any element of this list.

NULL