Structured Query Language Tutorials

The below tutorials are all focused on SQL. See sub-headings for more specifics! Acronyms used in this page include:

SQL Term Glossary

I don’t do much SQL in my day-to-day (at time of writing anyway) so having a term glossary will prove useful in brushing up periodically. Note that SQL is mostly not case sensitive but I find it easier to keep all SQL verbs fully capitalized for some visual distinction between functions versus data objects/variables.

Core Terms

I’m dividing the core terms based on a general category for their purpose.

Function Syntax Explanation
; {other fxns}; A semicolon indicates the end of a clause/set of clauses in SQL. Including it is necessary to indicate that a query should be executed
SELECT SELECT {column name(s)} Pick only some columns from a given data table. Comparable to the function of the same name from the dplyr package in R
FROM FROM {table name} Specify on which table the query should be operating. This is comparable to a data argument in R or parameter in Python
WHERE WHERE {conditions} Impose conditions to restrict which rows of the table should be returned. Uses a number of conditional verbs
ORDER BY ORDER BY {column} Sort the rows of the returned table by the specified column
OFFSET OFFSET {number of rows} Return only the specified number of rows
GROUP BY GROUP BY {column} Perform subsequent operations within unique levels of the specified column. Comparable to the function of the same name from the dplyr package in R
Function Syntax Explanation
CREATE TABLE CREATE TABLE {new table} ({column} {DataType} {TableConstraint} DEFAULT {default value}, ...) Make a new table by naming each column, deciding on its data type, giving it a constraint (optional), and setting the default value (optional)
Function Syntax Explanation
ALTER TABLE ALTER TABLE {table} Similar syntax to CREATE TABLE, same facets that were optional for that term are still optional here
Function Syntax Explanation
DROP TABLE DROP TABLE {table} Like CREATE TABLE, can be used with or without a IF EXISTS clause. If you have tables that are dependent upon other tables (e.g., because of FOREIGN KEY), then you must remove that dependency before you can drop the table

Subordinates to Core Terms

To my knowledge, the following SQL verbs can only be used in combination with one of the ‘core’ verbs included in the table above. Where possible, I’ve separated the subordinate terms into separate tabs depending on to which core term they are subordinate.

Aliasing with SELECT

Function Syntax Explanation
AS SELECT {column} AS {alias} Create an alias of the specified column. Aliases can be used to increase clarity of column names. Subsequent code can then use the alias instead of the original/actual column name. This also works for FROM


Summarizing with SELECT

Function Syntax Explanation
DISTINCT SELECT DISTINCT {column name} Keep only rows with unique values in the specified column. Comparable to the function of the same name from the dplyr package in R
COUNT SELECT COUNT({column}) AS {new column} Counts the number of each unique entity in the specified column
MIN/MAX/AVG/SUM SELECT AVG({column}) AS {new column} Computes the minimum/maximum/mean/sum (respectively) of the specified column
Function Syntax Explanation
=/!= WHERE {column} != "{character}" Conditional operator for numbers and characters. Note that character conditions are case-sensitive and must be in quotes! Operators mean:, ‘equal to’ and ‘not equal to’ respectively
</>/<=/>= WHERE {column} >= {#} Conditional operators for numbers only. Operators mean:, ‘less than’, ‘greater than’, ‘less than or equal to’, and ‘greater than or equal to’ respectively
AND WHERE {condition} AND {condition} Keep rows where both/all conditions are met. Comparable to R’s &
OR WHERE {condition} OR {condition} Keep rows where either/at least one condition is met. Comparable to R’s |
BETWEEN / NOT BETWEEN WHERE {column} BETWEEN {#} AND {#} Keep rows where the value is/is not between the specified numbers
IN / NOT IN WHERE {column} IN ({#}, {#}, {#}) Keep rows where the value is/is not in the provided set of numbers. Comparable to R’s %in% operator. Works for bare numbers or quoted characters
LIKE / NOT LIKE WHERE {column} LIKE "%{character}%" Keeps rows where the value has/doesn’t have at least a partial string match to the provided character (e.g., “mar” in “march”). This is case insenstive!
LIKE / NOT LIKE WHERE {column} LIKE "{character}_" Keeps rows where the value matches the string plus one character. Excludes an exact string match without that additional character. This is case insenstive!
IS NULL / IS NOT NULL WHERE {column} IS NULL Keep rows where the specified column is/is not NA

Aliasing Data

Function Syntax Explanation
AS FROM {table} AS {alias} Create an alias of the specified table. Aliases can be used to increase clarity of tables. Subsequent code can then use the alias instead of the original/actual table name. This also works for SELECT


Joining Data

All JOIN functions require ON as well. Note also that ON assumes the left column name(s) corresponds to the left table and vice versa but you can also specify the second table’s key explicitly with a period separating the table from its key (e.g., {table B}.{table B key}).

Function Syntax Explanation
INNER JOIN FROM {table A} INNER JOIN {table B} ON {table A key} = {table B key} Returns only rows where a given value is found in both tables’ key columns. Returns all columns from both tables where the relevant match is found. Note also that JOIN is equivalent to INNER JOIN but the longer form is retained here for the sake of precision
LEFT JOIN FROM {table A} LEFT JOIN {table B} ON {table A key} = {table B key} Returns only rows where a given value is found in table A’s key column. Returns all columns from both tables where the relevant match is found
RIGHT JOIN FROM {table A} RIGHT JOIN {table B} ON {table A key} = {table B key} Returns only rows where a given value is found in table B’s key column. Returns all columns from both tables where the relevant match is found
FULL JOIN FROM {table A} FULL JOIN {table B} ON {table A key} = {table B key} Returns only rows where a given value is found in either tables’ key column. Returns all columns from both tables where the relevant match is found

ORDER BY Subordinates

Function Syntax Explanation
ASC / DESC ORDER BY {column} ASC Order the rows in ascending/descending order of the provided column


OFFSET Subordinates

Function Syntax Explanation
LIMIT OFFSET {number of rows} LIMIT {row number} Specify from which row the offset rows should begin


GROUP BY Subordinates

Function Syntax Explanation
HAVING HAVING {group condition} Same as WHERE but only works if you first use GROUP BY. If not using GROUP BY, use WHERE instead


CREATE TABLE Subordinates

Function Syntax Explanation
IF NOT EXISTS CREATE TABLE IF NOT EXISTS {new table} ... Avoids the error that SQL will return if you try to make a new table with the same name as an existing one.
BOOLEAN / INTEGER / FLOAT / DOUBLE / REAL CREATE TABLE {new table} ({column} {DataType}, ...) Allowed types of number data (boolean is 0-1, others are self-explanatory)
CHARACTER / VARCHAR / TEXT CREATE TABLE {new table} ({column} {DataType}, ...) Allowed types of text data
DATE / DATETIME CREATE TABLE {new table} ({column} {DataType}, ...) Allowed types of date data. Note that there are similar formatting issues here that you may experience with this type of data in other programming languages
PRIMARY KEY / AUTOINCREMENT / UNIQUE / NOT NULL CREATE TABLE {new table} ({column} {DataType} {TableConstraint}, ...) Constraints that can be placed on a new column. The “primary key” is the row that uniquely identifies each row and ties it to some other relational table.
PRIMARY KEY CREATE TABLE {new table} ({column} {DataType} PRIMARY KEY, ...) Constraints that can be placed on a new column. The “primary key” is the column that uniquely identifies each row and ties it to some other relational table
AUTOINCREMENT CREATE TABLE {new table} ({column} {DataType} AUTOINCREMENT, ...) Constraints that can be placed on a new column. Only applies to integers. Only supported by some databases
UNIQUE CREATE TABLE {new table} ({column} {DataType} UNIQUE, ...) Constraints that can be placed on a new column. Doesn’t allow repeated entries when inserting rows into this table. Differs from the “primary key” because it doesn’t necessarily have to be able to identify a specific row
NOT NULL CREATE TABLE {new table} ({column} {DataType} NOT NULL, ...) Constraints that can be placed on a new column. Self-explanatory
CHECK CREATE TABLE {new table} ({column} {DataType} CHECK({expression}), ...) Allows for checking of whether values to be inserted match some user-specified expression. Good for QA of new data
FOREIGN KEY CREATE TABLE {new table} ({column} {DataType} FOREIGN KEY, ...) Guarantees that each value in this column is found in a column in another table


ALTER TABLE Subordinates

Function Syntax Explanation
ADD ALTER TABLE {table} ADD {column} {DataType} {TableConstraint} DEFAULT {default value} Similar syntax to CREATE TABLE. Features that were optional for that function are still optional here
DROP ALTER TABLE {table} DROP {column} Simpler than some alternatives because you just have to name the function to be deleted
RENAME ALTER TABLE {table} RENAME {column} AS {column} More permanent than the alias conferred by AS when used with SELECT

SQL Order of Operations

The general order of operations for a SQL query is as follows:

  1. FROM / JOIN
  2. WHERE & subordinates
  3. GROUP BY
  4. HAVING & subordinates
  5. SELECT & subordinates
  6. ORDER BY & subordinates
  7. OFFSET & subordinates