JOINs Continued...

Thus far, we have been using the FROM and WHERE clauses to join two tables.

SELECT m.First, m.Last, g.Course
  FROM Midn m, Grades g
  WHERE g.Alpha = m.Alpha;
You can imagine this pictorially,


In essence we have been performing what is referred to as an equi join, a special case of inner join.

SELECT First, Last, Course
  FROM Midn INNER JOIN Grades ON Grades.Alpha = Midn.Alpha;

INNER JOINs

Theses three methods are equivalent:

SELECT m.Alpha, First, Last, Course
  FROM Midn m, Grades g WHERE g.Alpha = m.Alpha;
SELECT m.Alpha, First, Last, Course
  FROM Midn m INNER JOIN Grades g ON g.Alpha = m.Alpha;
SELECT m.Alpha, First, Last, Course
  FROM Midn m JOIN Grades g ON g.Alpha = m.Alpha;

With the JOIN keyword, in MySQL, we have the option to use either ON or USING

SELECT Alpha, First, Last, Course
  FROM Midn JOIN Grades USING(Alpha);
USING makes it easier to specify the columns we join on, and assumes that the column(s) with that name exists in both tables, and the values must be equal. It removes the ambiguity that we have been working around when we select a field like Alpha (we have had to select what table to pull Alpha from, ex. Midn.Alpha). NOTE: Rows with NULL values in the join column(s) will not be part of the join answer, as NULL will not match any other value including NULL!

OUTER JOINs

While INNER JOINs return the those rows that intersect the two tables, OUTER JOINs can be used to return all values from a table and to include those records from the secondary joined table when they intersect, or NULL values (for the columns in the joined secondary table) otherwise.

SELECT First, Last, Course
  FROM Midn LEFT JOIN Grades ON Grades.Alpha = Midn.Alpha
  ORDER BY Last, First;
FirstLastCourse
CerseiLannisterNULL
JaimeLannisterNULL
TyrianLannisterHE111
TyrianLannisterHH104
TyrianLannisterNL110
TyrianLannisterNS101
TyrianLannisterSC111
TywinLannisterNULL
BranStarkNL110
EddardStarkNULL
RobStarkNL110
Below is one way of using this method to determine who has not taken a course

SELECT First, Last, Course
  FROM Midn LEFT JOIN Grades ON Grades.Alpha = Midn.Alpha
  WHERE Course IS NULL
  ORDER BY Last, First;

NATURAL JOINs

NATURAL JOINs are similiar to INNER JOIN except that it will use matching column names automatically for you. While this is the most lazy way possible, there is considerable risk if you are not paying attention and columns with the same name in the two tables have different meanings.

SELECT First, Last, Course
  FROM Midn NATURAL JOIN Grades;

Other JOINs

We have shown a LEFT JOIN, but similarly there is a RIGHT JOIN that uses the second table as the primary and will retrieve all rows of the right table and those in the left that match.

Set Operators

The set operators can be used to work with the results from multiple queries. The UNION operator allows us to combine the results of two SELECT statements, though they must have the same number of columns and data types. Duplicative rows are removed by default (use UNION ALL to retain duplicates). Below is an example of using UNION. Note this is not the most efficiant way to write that particular query. (the best way is to use just one select))

SELECT First, Last, Course
  FROM Midn JOIN Grades USING(Alpha)
  WHERE Course = 'NL110'
UNION
  SELECT First, Last, Course
    FROM Midn JOIN Grades USING(Alpha)
    WHERE Course = 'HH104';
There are additional set operators supported by some Relational DBMS: INTERSECT will return only rows that return identical values from both SELECT statements, and EXCEPT takes the two results and only returns those rows that exist in the first result but not in the second (similar with a minus in math). INTERSECT and EXCEPT are not implemented in MySQL.

More subqueries

We have seen before that we can have a SELECT query within another SELECT query. The "interior" queries are called subqueries. We can have subqueries within subqueries as well, as many as needed to answer the use question. The subqueries are always enclosed in parantheses ().

We can use a subquery that returns a value anywhere a value is needed (for example in SELECT, WHERE, or HAVING clauses). For example, the query below returns all the rows in the Grades table with the minimum (or maximum in real value, as an A will be the highest grade at USNA but the minimum value as a string) course grade:

SELECT *
  FROM Grades
 WHERE CourseGrade = (SELECT MIN(CourseGrade) FROM Grades);

We can use a subquery that returns a list of values everywhere a list is required, for example in a WHERE clause using IN or NOT IN. The query below returns all the students who never got an A in a class:

SELECT Alpha, Last, First
  FROM Midn
 WHERE Alpha NOT IN (SELECT Alpha
                       FROM Grades
                      WHERE CourseGrade = 'A');

We can use a subquery that returns a set of rows (one columns or more) as a temporary table, anywhere a table can be used, for example in the FROM clause. The query below returns a list of the lowest performers in each course:

SELECT First, Last, g.Course, g.CourseGrade
  FROM Midn m, Grades g, (SELECT Course, MAX(CourseGrade) AS MinGrade
                            FROM Grades
                           GROUP BY Course) AS newtable
 WHERE m.Alpha = g.Alpha
   AND g.Course = newtable.Course
   AND g.CourseGrade = newtable.MinGrade;

Bottom line is that subqueries can be used in a multitude of situations. Note however, that using joins is often more efficient than using subqueries, if both of the options are available to answer a question.

SQL Misceleanous

SELECT ... LIMIT

Sometimes you want to return only a few rows from the result of a query, instead of returning all rows that satisfy the query. MySQL has a LIMIT clause in the SELECT statement that can be used for that. Below is an example that returns up to 100 rows from the Midn table:

SELECT *
  FROM Midn
 WHERE Alpha like '25____'
 LIMIT 100;

One can also specify an offset or number of rows to skip (2 in the example below), before returning the specified number of rows (100 in the example below).

SELECT *
  FROM Midn
 WHERE Alpha like '25____'
 LIMIT 2,100;

More information about LIMIT can be found in the MySQL SELECT documentation.

More LOAD

As discussed in a previous lecture, LOAD DATA is a very useful MySQL command used to load a text file into a table. More information can be found in the MySQL LOAD DATA documentation. Below is an example of using LOAD DATA to insert data from a sample testData.csv file stored on the local computer into a table. The data has some missing values, which will be replaced by NULL in the table:

CREATE TABLE Test(column1 INT NOT NULL, column2 VARCHAR(50), column3 INT);

LOAD DATA LOCAL INFILE 'testData.csv'
  INTO TABLE Test
  FIELDS TERMINATED BY ','
  LINES TERMINATED BY '\r\n'
  IGNORE 1 ROWS
  (column1, @vcol2, @vcol3)
  SET column2 = NULLIF(@vcol2,''),
      column3 = NULLIF(@vcol3,'');

Note that you might need to configure your mysql client to accept uploading local files to the database server. In Unix, you can do that by creating a file ~/.my.cnf with the content:

!includedir /etc/mysql/conf.d/

[client]
local-infile=1

Practice Problems

Here is some additional data that we can use towards problems, take a look at a newer version of insert_Midn_data.sql which has two additional tables.

  1. Lets Join Lots of Tables
    1. Retrieve the Alpha, First, and Last from the Midn table.
    2. Now, LEFT JOIN the Grades table and add AcYr, Course, Title, and EndOfTerm columns.
    3. Next, LEFT JOIN the Sections table and add the InstID column.
    4. Finally, LEFT JOIN the Prof table and add the Professors first and last name (remember to deconflict with the Midn Table for First and Last.)
  2. Find all students who have failed a course (use JOIN keyword).
  3. Using a UNION show all of the courses that Tyrian Lannister or Bran Stark have taken.
  4. Rewrite the Last query without using UNION but with using a NATURAL JOIN
  5. Find the students who have taken NL110
    1. The original way. (Using implicit join -> WHERE m.Alpha = g.Alpha )
    2. With subqueries in WHERE condition
    3. With subqueries in FROM statement
    4. With explicit joins (JOIN, INNER JOIN, LEFT JOIN, NATURAL JOIN)
    5. Would it work with a subquery returned by using GROUP BY and HAVING?