Core Concepts

Overview

This section covers some of the most fundamental operations of both languages. These include variable/object assignment, data type/class, arithmetic, etc. External data are not included in this page.

Note that any line in a code chunk preceded by a hashtag (#) is a “comment” and is not evaluated in either language. Including comments is generally good practice because it allows humans to read and understand code that may otherwise be unclear to them.

Assignment

At its most basic, we want to store data in code in such a way that we can use / manipulate it via our scripts. This requires assigning data to a variable/object with the assignment operator.

In R, the assignment operator is <-. To use it, the name of the new object-to-be is on the left of the arrow and the information to assign is on the right.

# Make a simple object
a <- 2

# Check it out
a
[1] 2

In Python, the assignment operator is =. To use it, the name of the new object-to-be is on the left of the equal sign and the information to assign is on the right.

# Make a simple object
a = 2

# Check it out
a
2

Once we’ve created a variable/object we can then use the information stored inside of it in downstream operations! For example, we could perform basic arithmetic on our variable/object and assign the result to a new variable/object.

Addition, subtraction, multiplication, and division share operators across both languages (+, -, *, and / respectively). However, in R exponents use ^.

# Raise to an exponent
b <- a^3

# Check out the result
b
[1] 8

Addition, subtraction, multiplication, and division share operators across both languages (+, -, *, and / respectively). However, in Python exponents use **

# Raise to an exponent
b = a**3

# Check out the result
b
8

Type & Class

Some operations are only possible on some categories of information. For instance, we can only perform arithmetic on numbers. In Python this is known as the variable’s type & while in R this is the object’s class. In either case, it’s important to know–and be able to check–this information about the variables/objects with which we are working.

In R we use the class function to get this information. Note that the names of R classes sometimes differ from their equivalents in Python.

# Check class of an integer
class(37)
[1] "numeric"
# Check class of a decimal
class(3.14159)
[1] "numeric"
# Check class of text
class("my hands are typing words")
[1] "character"

In Python, the type function returns the type of the data object. Note that the names of Python types sometimes differ from their equivalents in R.

# Check type of an integer
type(37)
<class 'int'>
# Check type of a decimal
type(3.14159)
<class 'float'>
# Check type of text
type("my hands are typing words")
<class 'str'>

Indexing

When our variables/objects have more than one item/element we may want to examine the piece of information at a specific position. This position is the “index position” and can be accessed in either language fairly easily.

In order to explore this more fully, let’s make some example multi-component variables/objects.

In R, one of the fundamental data structures is a “vector”. Vectors are assembled with the concatenation function (c) where each item is separated by commas (,) and the set of them is wrapped in parentheses ((...)).

Note that the class of the object comes from the vector’s contents rather than the fact that it is a vector. All elements in a vector therefore must share a class.

# Make a multi-item variable
x <- c(1, 2, 3, 4, 5)

# Check it out
class(x)
[1] "numeric"

In Python the fundamental data structure is a “list”. Lists are assembled either by wrapping the items to include in square brackets ([...]) or by using the list function. In either case, each item is separated from the others by commas (,).

Note that the type of the variable comes from the list itself rather than its contents. Lists therefore support items of multiple different types.

# Make a multi-item variable
x = [1, 2, 3, 4, 5]

# Check it out
type(x)
<class 'list'>

One crucial difference between R and Python is that Python is “0-based” meaning that the first item is at index position 0 while in R the position of the equivalent element is 1.

Fortunately, in either language the syntax for indexing is the same.

To index a multi-element object, simply append square brackets to the end of the object name and specify the number of the index position in which you are interested.

# Access the first element of the vector
x[1]
[1] 1

To index a multi-item variable, simply append square brackets to the end of the variable name and specify the number of the index position in which you are interested.

# Access the first item of the list
x[0]
1

Slicing

When we index more than one position, this is known as “slicing”. We can still use square brackets in either language to slice multiple items/elements and the syntax inside of those brackets seems shared but yields different results due to inherent syntactical differences.

In R, when we write two numbers separated by a colon (:), that indicates that we want those two numbers and all integers between them.

# Demonstrate that the colon is shorthand for 'all numbers between'
1:10
 [1]  1  2  3  4  5  6  7  8  9 10

We can use this to slice out multiple continuous index positions from an object.

# Slice items in the `x` object
x[2:4]
[1] 2 3 4

In order to slice in Python, we include the start and stop bounds of the items that we want to slice separated by a colon (:) inside of square brackets. The first bound (i.e., bound position 0) is actually the starting bracket of the list! This means that we can treat the first number in the slice in the same way we would in single indexing but the second number is actually the bound before the item with that index value.

Another way of thinking about this is that it is similar to a mathematical set. The starting bound is inclusive while the ending bound is exclusive.

# Strip out several items of the Python list
x[2:4]
[3, 4]

Notice that we only get the items at third and fourth index position despite 4 being after the colon (which in an index would return the fifth index position)? That is because the fourth bound is after the fourth item but before the fifth item.