Load the tidyverse meta-package as well as our simulated lichen data.
# Load needed librarylibrary(tidyverse)# Load datalich_r <-read.csv(file =file.path("data", "tree_lichen.csv"))# Check out first few rowshead(lich_r, n =3)
Load the pandas and os libraries as well as our simulated lichen data.
# Load needed librariesimport osimport pandas as pd# Load datalich_py = pd.read_csv(os.path.join("data", "tree_lichen.csv"))# Check out first few rowslich_py.head(3)
All tabular data can be said to have a “shape” that is either wide or long. This is often used to simply indicate whether there are more columns than there are rows but it is perhaps more accurate to say that wide data have variables in separate columns while long data usually has a column of variable names and a column of values. This distinction is important because you can have a data table with more rows than columns that is still in wide format (e.g., the simulated example lichen community data we just loaded).
When wrangling data we may find it necessary to change the data from one shape to the other–this process is what is meant by “reshaping” (or sometimes “pivoting”). An example of each type of reshaping is included below.
Reshaping Longer
Our lichen data are currently in wide format so let’s reshape into long format.
R uses the pivot_longer function from the tidyr package for this operation. We need to specify the data table to reshape, the columns to collapse (to the cols argument), as well as the new column names for both the old column names and the values they contained. Fortunately the names_to and values_to arguments are reasonably intuitively named.
# Reshape longerlong_r <- tidyr::pivot_longer(data = lich_r,cols =c(lichen_foliose, lichen_fruticose, lichen_crustose),names_to ="lichen_types",values_to ="percent_cover")# Check out the first few rows of thathead(long_r, n =3)
Note that the column names to reshape must be a vector.
Python uses the melt function from the pandas library for this operation. We need to specify the DataFrame to reshape, the columns to exclude from collapsing (with the id_vars argument), as well as the new column names for both the old column names and the values they contained. Fortunately the var_name and value_name arguments are reasonably intuitively named.
# Reshape longerlong_py = pd.melt(frame = lich_py, id_vars = ["tree"], var_name ="lichen_types", value_name ="percent_cover")# Check out the first few rows of thatlong_py.head(3)
Note that the column names use as IDs must be a list (or similar object type).
Reshaping Wider
The opposite operation–reshaping wider–is also readily available in either Python or R. We’ll reshape the data we made into long format above to demonsrate re-reshaping back into wide format.
R uses the pivot_wider function (also from the tidyr package) to do this type of reshaping operation. Reshaping wider requires fewer arguments as only the columns to reshape need to be specified.
# Reshape (back) into wide formatwide_r <- tidyr::pivot_wider(data = long_r,names_from ="lichen_types",values_from ="percent_cover")# Check out the first few rowshead(wide_r, n =2)
We can use the pandas function pivot_table to reshape back into wide format. Unlike R, we do need to specify the index column label(s) that are not being reshaped.
# Reshape (back) into wide formatwide_py = pd.pivot_table(data = long_py, index = ["tree"], columns ="lichen_types", values ="percent_cover")# Do some small index reformatting stepswide_py = wide_py.reset_index().rename_axis(mapper =None, axis =1)# Check out the first few rowswide_py.head(2)