# Make a vector of animal types
<- c("lion", "tiger", "crocodile", "vulture", "hippo")
zoo_r
# Check that out
zoo_r
[1] "lion" "tiger" "crocodile" "vulture" "hippo"
Often we want to perform some set of operations repeatedly across a known number of iterations. For example, maybe we want to subset a given data file into a separate variable/object by month of data collection and export the resulting file as a CSV. We could simply copy/paste our ‘subset and export’ code as many times as needed but this can be error-prone. Also, it is cumbersome to manually update all copies of the relevant code when you identify a possible improvement.
One code solution to this is to automate the workflow using for
loops (casually referred to more simply as just “loops”). The syntax of Python and R is very similar for loops–likely because this is such a fundamental operation to any coding language!
Make a simple object to demonstrate loops.
With this simple variable/object in-hand we can now demonstrate the core facets of loops.
Loops (in either language) require a few core components in order to work properly:
for
statement – defines the start of the loop-definition componentin
statement – relates loop variable/object to set of list/vector to iterate acrossTo see in this syntax in action we’ll use a simple loop that prints each animal type in the list/vector we created above.
In R, the for
statement requires parentheses around the loop object, the in
statement, and the vector to iterate across. The operation(s) performed in each iteration must be wrapped in curly braces ({...}
).
When the code reaches the closing curly brace it returns to the top of the workflow and begins again with the next element of the provided vector.
[1] "lion"
[1] "tiger"
[1] "crocodile"
[1] "vulture"
[1] "hippo"
Note that when we are done the loop object still exists and is set to the last element of the vector we iterated across.
In Python, the for
statement, loop variable, in
statement, and list to iterate across do not use parentheses but the end of the line requires a colon :
. The operation(s) performed in each iteration must be indentened one level (i.e., press “tab” once or “space” four times).
When the code reaches the end of the indented lines it returns to the top of the workflow and begins again with the next item of the provided list.
lion
tiger
crocodile
vulture
hippo
Note that when we are done the loop variable still exists and is set to the last item of the list we iterated across.
We can also build conditional statements into a loop to create a loop that can flexibly handle different outcomes. We have discussed conditional operators elsewhere so we’ll only explain the parts of loop conditionals that we haven’t already discussed. To demonstrate, we can loop across a set of numbers and use conditionals to print whether the values are greater/less than or equal to zero.
In the example below we’ll use three new statements if
, else if
and else
. Each condition only performs its operation when its condition is met (i.e., returns True/TRUE).
These three statements all have similar syntax to the for
statement in that they evaluate something in parentheses and then perform some operation(s) in curly braces. They do differ slightly in context however:
if
can only be used first (or in cases where there is only if
and else
)else if
can only be used after if
(or after another else if
) and allows for specifying another condition.else
can only be used at the end; catches only cases that don’t meet one of the prior conditions# Loop across numbers
for(j in c(-2, -1, 0, 1, 2)){
# If less than 0
if(j < 0){
print(paste(j, "is negative"))
}
# If greater than 0
else if(j > 0){
print(paste(j, "is positive"))
}
# If neither of those, then it must be 0!
else {
print(paste(j, "is zero!"))
}
}
[1] "-2 is negative"
[1] "-1 is negative"
[1] "0 is zero!"
[1] "1 is positive"
[1] "2 is positive"
Note that to get the message to print correctly we needed to wrap a paste
function in print
to assemble multiple things into a single object.
These three statements all have similar syntax to the for
statement in that they evaluate something before a colon and then perform some operation(s) after that colon. They do differ slightly in context however:
if
can only be used first (or in cases where there is only if
and else
)elif
can only be used after if
(or after another elif
) and allows for specifying another condition.else
can only be used at the end; catches only cases that don’t meet one of the prior conditions# Loop across numbers
for k in [-2, -1, 0, 1, 2]:
# If less than 0
if k < 0:
print(str(k) + " is negative")
# If greater than 0
elif k > 0:
print(str(k) + " is positive")
# If neither of those, then it must be 0!
else:
print(str(k) + " is zero!")
-2 is negative
-1 is negative
0 is zero!
1 is positive
2 is positive
Note that to get the message to print correctly we needed to coerce the loop variable into type string (using the str
function).
Loops are a really powerful tool but they are limited in some ways. Sometimes we want to do a task once per project but only use it once in each instance. Such an operation is certainly “repeated” but not really the same context in which a loop makes sense. We can create reusable modular code to fit these circumstances by writing our own custom functions–“custom” in the sense that we write them ourselves rather than load them from a particular library.
Let’s write a simple function in both languages that simply multiplies two arguments by one another and returns the result.
Generating a function in R shares some syntax elements with loops and conditional statements! In this case we use the function
function to preserve our work as a function, then provide any needed arguments in parentheses, and end with curly braces with the operation(s) performed by the function inside. If the function produces something that we want to give back to the user, we need to specify that with the return
function.
Generating a function in Python shares some syntax elements with loops and conditional statements! In this case we use the def
statement then provide the name and–parenthetically–any needed arguments for our new function. If the function produces something that we want to give back to the user, we need to specify that by using the return
statement.
# Multiplication function
def mult_py(n, i):
# Add docstrings for later use (see below)
"""
Multiply two values by one another.
n -- First value to multiply
i -- Second value to multiply
"""
# Multiply the two values
result_py = n * i
# Return them
return result_py
# Once defined, we can invoke the function like we would any other
mult_py(n = 2, i = 5)
10
One component of custom functions to be aware of is their somewhat variable documentation. “Official” functions tend to be really well documented but custom functions have no required documentation. However, there are some best practices that we can try to follow ourselves to make life as easy as possible for people trying to intuit our functions’ purposes (including ourselves in the future!).
R contains no native mode of specifying function documentation! While there are tools to formalize this when functions are part of a formal package (see roxygen2 formatting) our custom functions cannot include documentation. That said, it is still good practice to include plain-language comment lines that describe the function’s operations even when they will only be visible where the function is defined.
Note that the docstring
package for R simulates Python-style docstrings for R functions but is not part of “base” R.
Python custom functions allow us to specify triple quoted ("""..."""
) documentation of function purpose/arguments known as “docstrings”. When this is supplied, we can use the help
function (or append a ?
after the function name) to print whatever documentation was included in the function when it was defined.
Sometimes a given argument will often be set to the same value. In cases like this, we can define that as the default of the argument which allows users to not specify that argument at all. When users do specify something for that argument, it overrides the default behavior. All functions (and Python methods) with “optional” arguments are using defaults behind the scenes to make those arguments optional.
We can define these defaults when we first create a function! Let’s make a simple division function that divides the first argument by the second and sets the default of the second argument to 2.
Write and demonstrate the simple division function.
# Define function
div_r <- function(p, q = 2){
# Do division
result_r <- p / q
# Return that
return(result_r)
}
# Test this function
div_r(p = 10)
[1] 5
Use the function again but set the second argument ourselves.
Write and demonstrate the simple division function.
# Define function
def div_py(n, i = 2):
# Write function documentation
"""
Divide the first value by the second
n -- Numerator
i -- Denominator
"""
# Do division
result_py = n / i
# Return that
return result_py
# Use the function with the default
div_py(n = 10)
5.0
Use the function again but set the second argument ourselves.
Just like loops, we can build conditional statements into our functions to make them more flexible and broadly useful. Let’s combine this with setting default values to demonstrate this effectively.
Let’s make a simple addition function and set both arguments to default to NULL
. NULL
is an R constant that allows us to create an object without assigning any value to it.
Note that we’re also using the is.null
function in our conditional in order to easily assess whether the argument has been left to its default (i.e., set to NULL
) or defined.
Now let’s use the function without specifying either argument.
Let’s make a simple addition function and set both arguments to default to None
. None
is a Python constant that allows us to create a variable without assigning any value to it.
Note that we’re also using the is
statement in our conditional (in this case it is equivalent to ==
).
# Define addition function
def add_py(n = None, i = None):
# Add documentation
"""Add two values (`n` and `i`)"""
# If first argument is missing, set it to 2
if n is None:
n = 2
# Do the same for the second argument
if i is None:
i = 2
# Sum the two arguments
result_py = n + i
# Return that
return result_py
Now let’s use the function without specifying either argument.