Throughout the semester we have taken on the challenge of Database Design, covering the range of topics from Requirements analysis, conceptual and logical design, schema refinement and normalization, and we have reached the final topic of Physical Tuning. In these final discussions it is time to review how the database stores information to disk, and how we can work to make our databases run faster, speed = money in the real world!
MySQL, and other Database Management Systems, need effecient methods in which to store your data. At a basic level you can imagine a file as a collection of records. The data is stored and retrieved in units called disk blocks or pages, and unlike the speeds provided by the host system's RAM, the time to retrieve a disk page varies depending upon the location of the information on disk. Knowing this, the relative placement of pages on disk can have a major impact on DBMS performance. The time needed for this Input/Output (I/O) time will be the largest factor in calculating the time for queries.
Let's consider the following classes of queries:
SELECT * FROM Product WHERE BarCode = 10002121;
SELECT * FROM Product WHERE Price BETWEEN 5 AND 15;
An index on a file speeds up selections on search key columns, and any subset of columns of a table can be the search key for an index on the table! The indexing method that we choose should be targeted to the class of queries we wish to speed up processing (either equality or range queries).
For queries where checking for items that are exactly equal to a value, we will want to use an index based on hashing. This provides for constant search time.
For queries where searching for items within a specified ranges are used often enough to warrant an index, we will typically use B+ trees. This achieves \(O(log_d N)\) search time where the fan out is \(d\) (typically approx 150) and the number of data entries is \(N\).
If the order of the rows on the hard disk is the same as the order of the data entries, this is known as a clustered index. A file can be clustered on (at most) one search key, and as a result the speed of retrieving data records through the index varies greatly based on whether or not the index is clustered. The following diagram will hopefully illustrate the importance of clustering the data on the key that will be used most frequently.
The syntax within MySQL to create an index has a format similar to other MySQL structures
CREATE [UNIQUE] INDEX index_name
[USING index_type]
ON table_name (col1, ...)
Where index_type is either BTREE or HASH. Example:
CREATE INDEX I_ItemPrice
USING BTREE
ON Product (Price);
It is useful to understand and calculate the time necessary to read data from the database. Lets walk through an example.
Here are a few of our settings and the spead of our example drive:
And these are a few of the necessary formulas we should consider:
The largest variable in the information will be the Read time (per page) when retrieving data. When working with spinning disk drives (HDD), the primary factors are:
If we transitioned from spinning drives to solid state drives we would have significant improvements in time. Why? No moving parts. Lets look at the primary factors with SSDs:
When we are adding indexes to our database, there are a few key questions we should consider:
SELECT * FROM Product WHERE Price BETWEEN 5 AND 10