A New Example

Let's assume that we want to capture some of the basic information that is stored within the MIDS system at the Academy (Information that is usually provided by the Midshipmen - Query Academic Information page). An Example.

What type of table schema would you create for these tables? The examples, Midshipmen and Grades are very rudimentary tables that only partially cover the information of the page, but will be used for the examples in this lesson. The page insert_Midn_data.sql can be used to populate this information into your databases.

If you review the schemas for those two tables, you will see that they share the Alpha column and nothing else. One table contains the information about the midshipmen, while the other will hold their Grades. If we want to run queries that return information from both tables at the same time we will need to join these two tables in our queries. Below are selected columns from both tables.

Table Midn
AlphaLastFirstBirthdateSATMSATVHighSchoolState
261000StarkEddard1956-08-14710640AK
261001StarkRob1994-03-24510740AK
261002StarkBran1999-12-11800770AK
261234LannisterTyrian1995-01-01780790GA
261235LannisterCersei1993-02-14580620GA
261236LannisterTywin1945-09-12630410GA
261237LannisterJaime1993-02-15350510GA
Table Grades
AlphaAcYrSemCourseTitle6Week12WeekExamCourseGrade
2610012023SPRINGNL110Preparing to LeadABFF
2610022023SPRINGNL110Preparing to LeadBCBA
2612342023SPRINGHE111Rhetoric and Intro to LiteratureBBB
2612342023SPRINGHH104American Naval HistoryAAAA
2612342023SPRINGNL110Preparing to LeadAABA
2612342023SPRINGNS101Fundamentals of SeamanshipBBBB
2612342023SPRINGSC111Foundations of Chemistry ICCDC

The SELECT Statement Continued

The SELECT statement is used to retrieve information from a database. Basic syntax:

SELECT [DISTINCT] column_name(s)|*
  FROM table_name(s)
[WHERE conditions]
[ORDER BY column(s) [ASC | DESC]]  

We will now expand our work with the FROM and WHERE clauses so that we can join information from more than one table. There are two important steps: all of the tables that will be used must be added to the FROM statement, and we must link them in the WHERE statement. As an example, what if we wanted to find the names of the students who are in NL110 using the data above?

SELECT m.Alpha, First, Last
  FROM Grades g, Midn m
 WHERE g.Alpha = m.Alpha AND Course = 'NL110';

AlphaFirstLast
261001RobStark
261002BranStark
1234TyrianLannister
What is with letters for tables?

A table referenced in the FROM clause of a SELECT query can be aliased using

table_name alias_name
or we can add AS to make our actions more obvious and our code more readable

table_name AS alias_name
and the alias_name (usually shorter than the table_name) can be used for that query.

Conceptually, what is happening?

The SQL SELECT query will be evaluated using the following conceptual strategy (in reality a more efficient strategy is used):

  1. Compute the cross-product of the tables listed in FROM. Let's look at abbreviated versions of our Midn and Grades tables to see what it means to calculate the cross-product (AKA cartesian product):

  2. Table Grades (Abbreviated)
    AlphaAcYrSemCourse
    2610012023SPRINGNL110
    2610022023SPRINGNL110
    Table Midn (Abbreviated)
    AlphaLastFirst
    261000StarkEddard
    261001StarkRob
    261002StarkBran
    Table Grades, Midn Cross-Product
    AlphaAcYrSemCourseAlphaLastFirst
    2610012023SPRINGNL110261000StarkEddard
    2610012023SPRINGNL110261001StarkRob
    2610012023SPRINGNL110261002StarkBran
    2610022023SPRINGNL110261000StarkEddard
    2610022023SPRINGNL110261001StarkRob
    2610022023SPRINGNL110261002StarkBran
  3. Discard resulting rows if they fail the conditions specified in WHERE clause. In this case, if the First Alpha does not match the second Alpha or the Course is not NL110 (in our abbreviated tables, every Course entry equals NL110!)
  4. AlphaAcYrSemCourseAlphaLastFirst
    2610012023SPRINGNL110261001StarkRob
    2610022023SPRINGNL110261002StarkBran
  5. Delete columns that are not requested in SELECT clause
  6. AlphaLastFirst
    261001StarkRob
    261002StarkBran
  7. if DISTINCT is specified, eliminate duplicate rows
This strategy is probably the least efficient way to compute a query, but the logic is correct and we can follow it! The DB optimizer will work to find more efficient strategies to compute the same answers.

What about ambiguity?

What happens when we try to return the Alpha and do not specify from which table it comes?

SELECT First, Last, Alpha
  FROM Midn m, Grades g
 WHERE m.Alpha = g.Alpha AND g.Course = 'NL110';
ERROR 1052 (23000): Column 'Alpha' in field list is ambiguous
Since both tables have the column Alpha, the database system does not know which one to provide, and it is not going to make the assumption that they are the same! You will have to choose!

SELECT First, Last, m.Alpha
  FROM Midn m, Grades g
 WHERE m.Alpha = g.Alpha AND g.Course = 'NL110';

Calculations in MySQL

MySQL has support built in for basic arithmetic and there are five aggregating functions we will focus on: COUNT, SUM, AVG, MIN, and MAX.

Simple Math
SELECT First, Last, SATM+SATV AS SAT FROM Midn;
FirstLastSAT
EddardStark1350
RobStark1250
BranStark1570
TyrianLannister1570
CerseiLannister1200
TywinLannister1040
JaimeLannister860

You may have noticed that it added the two values together, and we renamed the resulting column using the AS keyword!

Now trying some of the aggregate operators we can work on the entire data set at once. Review the following examples:

SELECT AVG(SATM+SATV) AS AverageSAT
  FROM Midn;
AverageSAT
1262.8571
SELECT COUNT(*) AS TotalStudents
  FROM Midn;
TotalStudents
7
SELECT SATM+SATV AS SAT, First, Last
  FROM Midn
 WHERE SATM+SATV = (SELECT MIN(SATM+SATV) FROM Midn);
SATFirstLast
860JaimeLannister

Grouping Rows together

If we wanted to find the the lowest grade in each class, we could attempt run a separate query for each course in the database.

SELECT MAX(CourseGrade)
  FROM Grades
 WHERE Course = 'NL110';
MAX(CourseGrade)
F
This would quickly become painful if we had hundreds of courses in which to check, it would be easier for us to have the database do the work for us!

The GROUP BY clause

SELECT Course, MAX(CourseGrade)
  FROM Grades
 GROUP BY Course;
CourseMAX(CourseGrade)
HE111B
HH104A
NL110F
NS101B
SC111C

Conceptually, what is happening with GROUP BY?

The SQL query will be evaluated using the following strategy:

The HAVING clause

The HAVING clause allows us to further restrict the results of the GROUP BY by removing groups that do not meet the additional constraints.

SELECT Course, MAX(CourseGrade)
  FROM Grades
 GROUP BY Course
HAVING MAX(CourseGrade) > 'B';
CourseMAX(CourseGrade)
NL110F
SC111C

Practice Problems

  1. Create a query that will return the number of A's given as final course Grades in the Grades table
  2. Write a single query that will provide Last name and average combined SAT score for each of the families (people with the same last name)
  3. Display the First name, Last name, course, and exam Grades for the Stark Family
  4. Add Alpha to the list of columns to be displayed in the Last query
  5. What is the highest grade in each course?

Advanced Problems

  1. Find the lowest performers in each class