it360 - indexing lab

IT360 Lab 11 - DB Performance

DUE: April 16, 2010, BEFORE start of lab

This lab should get you familiarized with indexing and query efficiency as well as loading a table in the database with data from a .cvs file.

Create a file called yourlastname_Lab11.txt that will contain all your answers for this lab. Write your name at top of the file.

Exercise 1: Consider a disk with average I/O time of 15 milliseconds per page and page size = 4096 bytes. For all of the following questions, the time needed to answer a query is number of pages accessed * average I/O time per page (we ignore the seek time).

Consider a table Products(BarCode, PName, Price) with 200,000 rows of 100 bytes each. The table is stored on disk such that no row spans 2 pages.

Compute the number of pages needed to store the entire table on disk

Compute the approximate time needed to evaluate the following query, assuming no indexes are used:

SELECT * FROM Products

Consider the following query:

SELECT * FROM Products WHERE Price between 10 and 100

Assume that a B+tree index exists on Price, and 75% of the products satisfy the condition in the WHERE clause (Price between 10 and 100). For the following questions, you can ignore the cost of retrieving the index entries, and the main cost of the operation comes from retrieving the rows, after the “pointers” (data entries in the index) are found.

c. Assume that the B+tree index is not clustered. Compute the approximate time (in seconds) needed to evaluate the query using the non-clustered B+tree index.

Assume now that the B+tree index is clustered. Compute the approximate time (in seconds) needed to evaluate the query using the clustered index.

Compute the approximate time (in seconds) needed to evaluate the query without using any index.

How many B+tree indexes can be created for table Products (assume there are no indexes currently created)? Justify your answer.

· Connect to your MySQL database: Runà All Programs à MySQLàMySQL Query Browser àenter schooner.cs.usna.edu as server, mXXXXXX as user, enter your password, and enter mXXXXXX as the default schema.

· Consider a table with the following schema

StudentsEnroll(StudentID, CourseID, ACYearEnd, Semester). Execute the following CREATE TABLE statement to create the table in your MySQL database.

CREATE TABLE StudentsEnroll(

StudentID char(6) NOT NULL,

CourseID char(6) NOT NULL,

AcYearEnd int NOT NULL,

Semester char(6) NOT NULL,

CONSTRAINT PK_StudentsEnroll PRIMARY KEY(StudentID, CourseID)

);

· I’ve created a file with student enrollment data and placed it in my Unix account at /home/adina/Teaching/StudentsEnrollDataMathDiv.csv. The file is accessible by all, including the MySQL server running on schooner.cs.usna.edu.

· Load that data into the StudentsEnroll table that you created, by using the following LOAD DATA statement in your Query Browser (most database systems provide some statement to load data from a file into a table in a database; load data infile is for MySQL):

LOAD DATA INFILE '/home/adina/Teaching/StudentsEnrollDataMathDiv.csv'

INTO TABLE StudentsEnroll

FIELDS TERMINATED BY ','

LINES TERMINATED BY '\n';

To check that everything went well, execute a

“SELECT Count(*) FROM StudentsEnroll” in your Query Browser. You should see 39039 rows.

Exercise 2:

a) Execute the following query:

SELECT CourseID, Count(*) AS NumStudents

FROM StudentsEnroll

WHERE ACYearEnd = 2009 and Semester = 'SPRING'

GROUP BY CourseID;

a.1) What are the steps used by the database system to answer the query? (Click the “Explain” button in the menu, and look at the “Possible keys”, “Key”, “Extra” column in the pane that opens; check the documentation available on the web for MySQL “How to optimize queries with Explain” http://dev.mysql.com/doc/refman/5.1/en/using-explain.html or in the “PHP and MySQL Web Development” book – page 296-300, in particular Table 12.9 on page 300)

a.2) What was the time needed to complete the query? (Time and number of rows fetched by each select query is reported on the same screen as the results of the query, on the bottom – use the first reported time, not the one showing in parentheses).

a.3) Re-execute the above select statement three times, and report the average time.

b) What index (if any) would you create to improve the efficiency of the query? Justify your answer.

c) Write the SQL statement to create the above index in MySQL. Execute the statement in MySQL.

d) Execute again the query

SELECT CourseID, Count(*) AS NumStudents

FROM StudentsEnroll

WHERE ACYearEnd = 2009 and Semester = 'SPRING'

GROUP BY CourseID;

d.1) What are the steps used by the database system to answer the query now? (Click the “Explain” button in the menu, and look at the “Possible Keys”, “Key”, “Extra” column in the pane that opens) If the database did not use the index you just created, try to explain why.

d.2) What was the time needed to complete the query? (Time and number of rows fetched by each select query is reported on the same screen as the results of the query).

d.3) Re-execute the above select statement three times, and report the average time. If the execution time is higher than the execution time you obtained in a.3), try to explain why.

Turn in (due before start of lab on April 16, 2010):

Electronic:

Upload yourlastname_Lab11.txt with all your answers to the Lab 11 assignment on the blackboard.

Hard-copies:

The completed assignment coversheet. Your comments will help us improve the course.
A hard copy of your yourlastname_Lab11.txt containing the answers for each exercise.