The below tutorials are all focused on SQL. See sub-headings for more specifics! Acronyms used in this page include:
- SQL = Structured Query Language
- QA = Quality Assurance
SQL Term Glossary
I don’t do much SQL in my day-to-day (at time of writing anyway) so having a term glossary will prove useful in brushing up periodically. Note that SQL is mostly not case sensitive but I find it easier to keep all SQL verbs fully capitalized for some visual distinction between functions versus data objects/variables.
Core Terms
I’m dividing the core terms based on a general category for their purpose.
; |
{other fxns}; |
A semicolon indicates the end of a clause/set of clauses in SQL. Including it is necessary to indicate that a query should be executed |
SELECT |
SELECT {column name(s)} |
Pick only some columns from a given data table. Comparable to the function of the same name from the dplyr package in R |
FROM |
FROM {table name} |
Specify on which table the query should be operating. This is comparable to a data argument in R or parameter in Python |
WHERE |
WHERE {conditions} |
Impose conditions to restrict which rows of the table should be returned. Uses a number of conditional verbs |
ORDER BY |
ORDER BY {column} |
Sort the rows of the returned table by the specified column |
OFFSET |
OFFSET {number of rows} |
Return only the specified number of rows |
GROUP BY |
GROUP BY {column} |
Perform subsequent operations within unique levels of the specified column. Comparable to the function of the same name from the dplyr package in R |
CREATE TABLE |
CREATE TABLE {new table} ({column} {DataType} {TableConstraint} DEFAULT {default value}, ...) |
Make a new table by naming each column, deciding on its data type, giving it a constraint (optional), and setting the default value (optional) |
ALTER TABLE |
ALTER TABLE {table} |
Similar syntax to CREATE TABLE, same facets that were optional for that term are still optional here |
DROP TABLE |
DROP TABLE {table} |
Like CREATE TABLE, can be used with or without a IF EXISTS clause. If you have tables that are dependent upon other tables (e.g., because of FOREIGN KEY), then you must remove that dependency before you can drop the table |
Subordinates to Core Terms
To my knowledge, the following SQL verbs can only be used in combination with one of the ‘core’ verbs included in the table above. Where possible, I’ve separated the subordinate terms into separate tabs depending on to which core term they are subordinate.
Aliasing with SELECT
AS |
SELECT {column} AS {alias} |
Create an alias of the specified column. Aliases can be used to increase clarity of column names. Subsequent code can then use the alias instead of the original/actual column name. This also works for FROM |
Summarizing with SELECT
DISTINCT |
SELECT DISTINCT {column name} |
Keep only rows with unique values in the specified column. Comparable to the function of the same name from the dplyr package in R |
COUNT |
SELECT COUNT({column}) AS {new column} |
Counts the number of each unique entity in the specified column |
MIN/MAX/AVG/SUM |
SELECT AVG({column}) AS {new column} |
Computes the minimum/maximum/mean/sum (respectively) of the specified column |
=/!= |
WHERE {column} != "{character}" |
Conditional operator for numbers and characters. Note that character conditions are case-sensitive and must be in quotes! Operators mean:, ‘equal to’ and ‘not equal to’ respectively |
</>/<=/>= |
WHERE {column} >= {#} |
Conditional operators for numbers only. Operators mean:, ‘less than’, ‘greater than’, ‘less than or equal to’, and ‘greater than or equal to’ respectively |
AND |
WHERE {condition} AND {condition} |
Keep rows where both/all conditions are met. Comparable to R’s & |
OR |
WHERE {condition} OR {condition} |
Keep rows where either/at least one condition is met. Comparable to R’s | |
BETWEEN / NOT BETWEEN |
WHERE {column} BETWEEN {#} AND {#} |
Keep rows where the value is/is not between the specified numbers |
IN / NOT IN |
WHERE {column} IN ({#}, {#}, {#}) |
Keep rows where the value is/is not in the provided set of numbers. Comparable to R’s %in% operator. Works for bare numbers or quoted characters |
LIKE / NOT LIKE |
WHERE {column} LIKE "%{character}%" |
Keeps rows where the value has/doesn’t have at least a partial string match to the provided character (e.g., “mar” in “march”). This is case insenstive! |
LIKE / NOT LIKE |
WHERE {column} LIKE "{character}_" |
Keeps rows where the value matches the string plus one character. Excludes an exact string match without that additional character. This is case insenstive! |
IS NULL / IS NOT NULL |
WHERE {column} IS NULL |
Keep rows where the specified column is/is not NA |
Aliasing Data
AS |
FROM {table} AS {alias} |
Create an alias of the specified table. Aliases can be used to increase clarity of tables. Subsequent code can then use the alias instead of the original/actual table name. This also works for SELECT |
Joining Data
All JOIN functions require ON as well. Note also that ON assumes the left column name(s) corresponds to the left table and vice versa but you can also specify the second table’s key explicitly with a period separating the table from its key (e.g., {table B}.{table B key}).
INNER JOIN |
FROM {table A} INNER JOIN {table B} ON {table A key} = {table B key} |
Returns only rows where a given value is found in both tables’ key columns. Returns all columns from both tables where the relevant match is found. Note also that JOIN is equivalent to INNER JOIN but the longer form is retained here for the sake of precision |
LEFT JOIN |
FROM {table A} LEFT JOIN {table B} ON {table A key} = {table B key} |
Returns only rows where a given value is found in table A’s key column. Returns all columns from both tables where the relevant match is found |
RIGHT JOIN |
FROM {table A} RIGHT JOIN {table B} ON {table A key} = {table B key} |
Returns only rows where a given value is found in table B’s key column. Returns all columns from both tables where the relevant match is found |
FULL JOIN |
FROM {table A} FULL JOIN {table B} ON {table A key} = {table B key} |
Returns only rows where a given value is found in either tables’ key column. Returns all columns from both tables where the relevant match is found |
ORDER BY Subordinates
ASC / DESC |
ORDER BY {column} ASC |
Order the rows in ascending/descending order of the provided column |
OFFSET Subordinates
LIMIT |
OFFSET {number of rows} LIMIT {row number} |
Specify from which row the offset rows should begin |
GROUP BY Subordinates
HAVING |
HAVING {group condition} |
Same as WHERE but only works if you first use GROUP BY. If not using GROUP BY, use WHERE instead |
CREATE TABLE Subordinates
IF NOT EXISTS |
CREATE TABLE IF NOT EXISTS {new table} ... |
Avoids the error that SQL will return if you try to make a new table with the same name as an existing one. |
BOOLEAN / INTEGER / FLOAT / DOUBLE / REAL |
CREATE TABLE {new table} ({column} {DataType}, ...) |
Allowed types of number data (boolean is 0-1, others are self-explanatory) |
CHARACTER / VARCHAR / TEXT |
CREATE TABLE {new table} ({column} {DataType}, ...) |
Allowed types of text data |
DATE / DATETIME |
CREATE TABLE {new table} ({column} {DataType}, ...) |
Allowed types of date data. Note that there are similar formatting issues here that you may experience with this type of data in other programming languages |
PRIMARY KEY / AUTOINCREMENT / UNIQUE / NOT NULL |
CREATE TABLE {new table} ({column} {DataType} {TableConstraint}, ...) |
Constraints that can be placed on a new column. The “primary key” is the row that uniquely identifies each row and ties it to some other relational table. |
PRIMARY KEY |
CREATE TABLE {new table} ({column} {DataType} PRIMARY KEY, ...) |
Constraints that can be placed on a new column. The “primary key” is the column that uniquely identifies each row and ties it to some other relational table |
AUTOINCREMENT |
CREATE TABLE {new table} ({column} {DataType} AUTOINCREMENT, ...) |
Constraints that can be placed on a new column. Only applies to integers. Only supported by some databases |
UNIQUE |
CREATE TABLE {new table} ({column} {DataType} UNIQUE, ...) |
Constraints that can be placed on a new column. Doesn’t allow repeated entries when inserting rows into this table. Differs from the “primary key” because it doesn’t necessarily have to be able to identify a specific row |
NOT NULL |
CREATE TABLE {new table} ({column} {DataType} NOT NULL, ...) |
Constraints that can be placed on a new column. Self-explanatory |
CHECK |
CREATE TABLE {new table} ({column} {DataType} CHECK({expression}), ...) |
Allows for checking of whether values to be inserted match some user-specified expression. Good for QA of new data |
FOREIGN KEY |
CREATE TABLE {new table} ({column} {DataType} FOREIGN KEY, ...) |
Guarantees that each value in this column is found in a column in another table |
ALTER TABLE Subordinates
ADD |
ALTER TABLE {table} ADD {column} {DataType} {TableConstraint} DEFAULT {default value} |
Similar syntax to CREATE TABLE. Features that were optional for that function are still optional here |
DROP |
ALTER TABLE {table} DROP {column} |
Simpler than some alternatives because you just have to name the function to be deleted |
RENAME |
ALTER TABLE {table} RENAME {column} AS {column} |
More permanent than the alias conferred by AS when used with SELECT |
SQL Order of Operations
The general order of operations for a SQL query is as follows:
FROM / JOIN
WHERE & subordinates
GROUP BY
HAVING & subordinates
SELECT & subordinates
ORDER BY & subordinates
OFFSET & subordinates