# The 'tidyverse' meta-package contains many of the tools we'll need
install.packages("tidyverse")
# For SQL operations, DBI is needed
install.packages("DBI")
# For spatial operations we'll need sf and terra
install.packages("sf")
install.packages("terra")
Welcome!
I think of myself as a competent R coder but am a total novice when it comes to Python. This repository is my attempt at forcing myself to ‘eat my vegetables’ and gain basic competency in Python. I think trying for a 1-to-1 R translation to Python will be an effective learning method (at least to start) and enshrining it in a Quarto website will keep me rigorous about documenting my process.
Package Installation
Both coding languages rely on packages to install specific functions that are absent from the ‘base’ version of either Python or R. The following code chunks are not evaluated in the building of this website but you’ll need to install these packages on your local machine (if you haven’t already done so) in order to run the code in the rest of this translation tutorial.
R contains an install.packages
function for installing packages desired libraries.
Python packages must be installed via the command line.
# The numpy and pandas packages contain fundamental Python tools
python3 -m pip install pandas
python3 -m pip install numpy
# For plotting we'll want some other packages
python3 -m pip install matplotlib
python3 -m pip install plotnine
## plotnine is Python's ggplot2
# For SQL operations we'll need the sqlite3 package
python3 -m pip install sqlite3
# For spatial operations we'll need a few packages
python3 -m pip install geopandas
python3 -m pip install rasterio
python3 -m pip install rioxarray
Section Overviews
Fundamentals
When learning a new programming language, it can be really helpful to begin with dramatically simplified examples to demonstrate crucial concepts. We can also build upon really core concepts into more nuanced fundamentals like automation or string/character methods.
Data Wrangling
The beating heart of my day-to-day work revolves around data ‘wrangling’. I view ‘wrangling’ as any scripted data manipulations between the very first raw data entered digitally and the data being ready for any analysis/visualization. This covers a huge swath of operations and should allow me to explore Python equivalents to many of the R operations that I know and love.
Visualization
Somewhat self-explanatory but this section is all about data visualization! While visualization can be an effective quality control tool it is also useful in data exploration and–eventually–to generate publication- or report-quality graphics. This section attempts to cover the fundamentals of data “viz” in both languages but is by no means exhaustive!
SQL
SQL is a powerful programming language in its own right that is intended to work with relational databases. Relational databases include several data tables of various sizes/structures that share some common ‘index’ columns that allow them to be combined as needed. Both Python and R allow users to access these databases using SQL syntax while still living in their preferred coding language. This section highlights some of the major considerations when working with databases through either language though it is not a tutorial on SQL’s syntax itself.
Glossary
As the heading would suggest, I’m housing various term definitions here. As of now, it makes most sense to me to provide the definition for a concept and then give the term in Python & R. Note that I also give a more functional definition of major concepts in the code tutorial pages upon first mention.
Contributing Guidelines
Contributing Overview
A comprehensive and accessible coding bilingualism website like this one is a huge undertaking and I’d welcome collaborators who share my vision for the value of a resource like this. I’m a competent R coder but that definitely does not mean I am 100% correct all the time nor that I always write explanations in the clearest way possible. On the Python front, my first real foray into that coding language is taking tutorials and making this website.
So, if you’d like to collaborate with me on this I have drafted the following guidelines. I’m happy to discuss/modify these with prospective collaborators so please don’t let them dissuade you from reaching out!
Bug / Issue Reporting
If you see something wrong–either in a code chunk or in the plain text–I’d really appreciate it if you flagged it for my attention. You can do this by opening a GitHub Issue on this project’s repository. Please include the link to the page with the problem and as much detail as possible so I can easily find the problem area and make any needed repairs.
If you identify a bug in this manner I’ll add your name (and the link to the professional website of your choosing) to the list below!
Bug Finders
- Timothy Divoll – GitHub profile
Co-Development
If you’d like to actively collaborate with me on developing and refining this website that would be awesome! Please either reach out to me directly (see my website for my contact info) or open a GitHub Issue to get that conversation started.
I’m envisioning that each new collaborator would (1) fork the website’s GitHub repository, (2) make any edits that they had in mind, and (3) then submit a pull request to get those changes integrated into the primary website. I feel this minimizes the risk that changes have unintentional consequences for the website rendering as whole. I’m absolutely open to a branch-based alternative if that makes more sense and I anticipate refining the logistical elements of contributing once the development team grows somewhat.
Contribution Credit
If you are interested/willing to join me in refining this website, I believe that is absolutely worthy of formal credit. You’ll notice that the top right of the navbar has a “Creators” dropdown menu. If you contribute substantively (e.g., demonstrating new tools, adding a new page, etc.) we can add your name to the dropdown and link it to the professional website of your choosing.
If you have other modes of properly acknowledging your contribution(s) in mind I am absolutely open to discussing those ideas!
Additional Resources
If this is of interest, consider checking out these useful coding language-specific tutorials and other attempts at R/ Python bilingualism. See below:
Bilingualism Resources
- The EEOB-BioData faculty at Iowa State University offer a “Computational Skills For Biological Data” course that covers R and Python (and Unix)
- ESIIL (Environmental Data Science Innovation and Inclusion Lab) has created an R-Python bilingualism tutorial that is framed for a more applied audience
- Marie Rivers also made a website for exploring Quarto more generally that I’ve found super helpful for the part of it that deals with R/Python bilingualism
- Rebecca Barter made a nice blog post titled “An Introduction to Python for R Users” that focuses on comparing Python operations to the logic of R (rather than side-by-side comparison)
Python Resources
- The Carpentries has a “Data Analysis and Visualization in Python for Ecologists” lesson that is really well put-together (as is characteristic of The Carpentries’ content)
- The Earth Lab has a Python course that was recommended to me by an NCEAS employee (specifically section 4)
- Dr. Diba Mirza taught a UCSB Computer Science (CS8) course on Python
- There’s a nice 30 Days of Python challenge on GitHub you can work through at your own pace that covers a breadth of Python skills
R Resources
- R for Data Science is a free online book that does a really nice job covering a lot of powerful R topics
- The Carpentries also has a “Data Analysis and Visualization in Rfor Ecologists” that is very well done