SQL Quick Review
What is SQL and why do we use it?
SQL is another programming language that you’ll be learning. The difference with SQL though is that it is what’s called a declarative programming language. That means that instead of writing the steps that we want our code to perform (like we’ve been doing with Scheme and Python), we will just write the desired result and leave it up to SQL to figure out the best way to get us that result.
SQL allows us to work with tables, which are components of databases. In 61A specifically, we will be creating and updating tables. Databases and tables are very important to understand because they’re critical for pretty much any software engineering job.
Creating and using tables
A table is a way to organize data in columns and rows, where each column has a name and one type of value (integer, string, etc). We can create a table using the create table operation in SQL, which creates a new table row by row and binds it to a name (kind of like an assignment statement):
create table ice_cream as
select “yogurt park” as shop, 4 as avg_price, “southside” as location
union
select “almare’s”, 5, “downtown” union
select “menchie’s”, 3, “northside” union
select “john’s”, 1, “downtown”;
You can see that we list the values that we want in our table (in order by column), row by row. Note that after the first row, we don’t have to continue including the column names.
Now what can we actually do with a table? Well as you may be able to imagine, these tables can be really huge. Sometimes, only part of the information might be relevant, so we may want to filter out parts of the table. Let’s see how we can do that:
select - selects some or all of the columns of a table
from - chooses table that we’re selecting from
where - filters out rows using a conditional
order by - orders rows by a column; default is ascending (asc), can order by descending by adding desc
limit - Selects the top <integer> rows
Format of a basic SQL query:
select columns
from table
where condition
order by column
asc/desc
limit num_rows
Now how do you go about coming up with a SQL query? By following this set of steps:
Figure out which table you want to get your information from (from)
Figure out which (if any) rows you want to filter out (where)
Figure out how you want to order your output and if you want to limit the number of rows (order by/limit)
Figure out which columns you want to output (select)
Now let’s do an example using the same ice cream table from above. Let’s say that I wanted only the shop name and average price of the cheapest shop in downtown. How could I do that using SQL syntax?
select shop, avg_price
from ice_cream
where location = “downtown”
order by avg_price
limit 1
Step 1: I know that the table that I want to get my info from is the ice_cream table, so that’s what’ll go in my from.
Step 2: I want to consider only rows where location = “downtown” so that’ll be by where condition
Step 3: I want to order by average price because then I know the cheapest ice cream place will be at the top, so I can limit the number of rows to 1
Step 4: I want only the shop name, so I’ll select shop in my select
Not so bad if you break it down right? Now what if you wanted to use information from more than just one table?
Joining tables
In order to use multiple tables in a SQL query, we have to join them into one table. When we join two tables, the resulting table is a cross product of the two tables, meaning that for each row r1 in the left table and each row r2 in the right table, we have a row in our result that is a merge between that r1 and r2. For example, the two tables on the left, when joined, will produce the table on the right. You can see that every combination of rows from the two tables is represented in the result.
Now let’s look at the syntax for joins.
table1, table2 - joins two tables by creating every combination of rows from table1 and table2
as - creates an alias for a table (alias means that we can refer to the table using only the new alias)
Format of a SQL query with a join:
select columns
from table1 as t1, table2 as t2
where condition
order by column
asc/desc
limit num_rows
Now how do you figure out how to write a SQL query with join(s)? Well, we can slightly modify the steps from above to account for joins:
Figure out which table(s) you want to get your information from and if you need to join any tables (from)
Figure out which (if any) rows you want to filter out using where (where)
Figure out how you want to order your output and if you want to limit the number of rows (order by/limit)
Figure out which columns you want to output (select)
Aggregation
One other thing we can do with SQL is use aggregation, which allows us to perform an operation on rows or groups of rows. This can be useful when we want to do things like find an average value of a particular column, or find the number of rows that have some value in a particular column. Some examples of aggregate functions are average, count, min, max, and sum. Note that aggregate functions can only be used in the having or select clauses of a query.
group by - groups rows by values in some column
having - filters out groups using some conditional (you cannot have a having without a group by)
Format of an SQL query with joins and aggregation:
select columns
from table1 as t1, table2 as t2
where condition
group by column
having condition
order by column
asc/desc
limit num_rows
Let’s update our SQL steps to account for aggregation:
Figure out which table(s) you want to get your information from and if you need to join any tables (from)
Figure out which (if any) rows you want to filter out (where)
Figure out if/how you want to group your output (group by)
Figure out if you want to filter out any groups (having)
Figure out how you want to order your output and if you want to limit the number of rows (order by/limit)
Figure out which columns/groups you want to select (select)
Updating Tables
Now it’s great that we’ve learned how to use tables to create new ones, but similar to things like lists and trees, we can also mutate the table, or update the values inside it. We have three operations that we can use:
insert into- inserts a row of values into a table (the row can have values in some or all of the columns)
delete from - deletes any rows that fulfill the given conditional from the table
update - updates values in the specified columns/rows of the table
Let’s see some examples using our ice_cream table from above:
insert into ice_cream values ("cream", 4, "southside");
delete from ice_cream where location = "northside";
delete from ice_cream where location = "northside";