IT360 Lab 11 - DB Performance
DUE: April 16, 2010, BEFORE start of
lab
This lab should get you familiarized with indexing and query efficiency as well as loading a table in the database with data from a .cvs file.
Create a file called yourlastname_Lab11.txt that will contain all your answers for this lab. Write your name at top of the file.
Exercise 1: Consider a disk with average I/O time of 15 milliseconds per page and page size = 4096 bytes. For all of the following questions, the time needed to answer a query is number of pages accessed * average I/O time per page (we ignore the seek time).
Consider a table Products(BarCode, PName, Price) with 200,000 rows of 100 bytes each. The table is stored on disk such that no row spans 2 pages.
SELECT * FROM Products
Consider the following query:
SELECT * FROM Products WHERE Price between 10 and 100
Assume that a B+tree index exists on Price, and 75% of the products satisfy the condition in the WHERE clause (Price between 10 and 100). For the following questions, you can ignore the cost of retrieving the index entries, and the main cost of the operation comes from retrieving the rows, after the “pointers” (data entries in the index) are found.
c. Assume that the B+tree index is not clustered. Compute the approximate time (in seconds) needed to evaluate the query using the non-clustered B+tree index.
· Connect to your MySQL database: Runà All Programs à MySQLàMySQL Query Browser àenter schooner.cs.usna.edu as server, mXXXXXX as user, enter your password, and enter mXXXXXX as the default schema.
· Consider a table with the following schema
StudentsEnroll(StudentID, CourseID, ACYearEnd, Semester). Execute the following CREATE TABLE statement to create the table in your MySQL database.
CREATE TABLE StudentsEnroll(
StudentID char(6) NOT NULL,
CourseID char(6) NOT NULL,
AcYearEnd int NOT NULL,
Semester char(6) NOT NULL,
CONSTRAINT PK_StudentsEnroll PRIMARY KEY(StudentID, CourseID)
);
· I’ve created a file with student enrollment data and placed it in my Unix account at /home/adina/Teaching/StudentsEnrollDataMathDiv.csv. The file is accessible by all, including the MySQL server running on schooner.cs.usna.edu.
· Load that data into the StudentsEnroll table that you created, by using the following LOAD DATA statement in your Query Browser (most database systems provide some statement to load data from a file into a table in a database; load data infile is for MySQL):
LOAD DATA INFILE '/home/adina/Teaching/StudentsEnrollDataMathDiv.csv'
INTO TABLE StudentsEnroll
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
To check that everything went well, execute a
“SELECT Count(*) FROM StudentsEnroll” in your Query Browser. You should see 39039 rows.
Exercise 2:
a) Execute the following query:
SELECT CourseID, Count(*) AS NumStudents
FROM StudentsEnroll
WHERE ACYearEnd = 2009 and Semester = 'SPRING'
GROUP BY CourseID;
a.1) What are the steps used by the database system to answer the query? (Click the “Explain” button in the menu, and look at the “Possible keys”, “Key”, “Extra” column in the pane that opens; check the documentation available on the web for MySQL “How to optimize queries with Explain” http://dev.mysql.com/doc/refman/5.1/en/using-explain.html or in the “PHP and MySQL Web Development” book – page 296-300, in particular Table 12.9 on page 300)
a.2) What was the time needed to complete the query? (Time and number of rows fetched by each select query is reported on the same screen as the results of the query, on the bottom – use the first reported time, not the one showing in parentheses).
a.3) Re-execute the above select statement three times, and report the average time.
b) What index (if any) would you create to improve the efficiency of the query? Justify your answer.
c) Write the SQL statement to create the above index in MySQL. Execute the statement in MySQL.
d) Execute again the query
SELECT CourseID, Count(*) AS NumStudents
FROM StudentsEnroll
WHERE ACYearEnd = 2009 and Semester = 'SPRING'
GROUP BY CourseID;
d.1) What are the steps used by the database system to answer the query now? (Click the “Explain” button in the menu, and look at the “Possible Keys”, “Key”, “Extra” column in the pane that opens) If the database did not use the index you just created, try to explain why.
d.2) What was the time needed to complete the query? (Time and number of rows fetched by each select query is reported on the same screen as the results of the query).
d.3) Re-execute the above select statement three times, and report the average time. If the execution time is higher than the execution time you obtained in a.3), try to explain why.
Turn in (due
before start of lab on April 16, 2010):
Electronic:
Hard-copies: