Integrating MySQL with Python - Part 2

We saw last time how to use the MySQLdb to connect to a database, run queries, and retrieve the results. Today we will discusst how to store the results of a SELECT query into a pandas data frame for future processing.

MySQL and Pandas - Recommended Approach

Let's assume you have a database connection stored in the db variable and a cursor stored in the cursor variable, and you just executed a SELECT query and fetched all the results. You can construct a pandas data frame from the result fetched from the cursor by creating a DataFrame object and passing the fetched result to the constructor. If you want to give names to the columns in the data frame, corresponding to the names returned from the SELECT statement, we can retrieve that information from cursor.description and passed that to the DataFrame constructor as well.

from mydb import db                # This is our new mydb.py with the connection class
import pandas as pd

query = """
SELECT Alpha, Course
  FROM Grades
 WHERE Course=%s
"""
values = ['SD321']

cursor = db.db.cursor()            # Directly access the MySQLdb object within our connection class
cursor.execute(query, values)      # Execute the query

results = cursor.fetchall()        # Retrieve the results

columns_desc = cursor.description  # Retrieve information about columns

# Get the column names (first element of each tuple in columns_desc
columns_name = [t[0] for t in columns_desc]

# Create a data frame from results
df = pd.DataFrame(results, columns = columns_name)

# Print a few rows from result
print(df.head(4))

MySQL and Pandas - Another Approach

Pandas has built-in support to read from a database using read_from_sql method. However, pandas does not officially support MySQLdb connections, so even if using MySQLdb is currently possible, support might change in the future. A warning is issued when using MySQLdb connection with pandas.read_sql_query.

In order to use pandas.read_sql_query, we need the actual SELECT query, the database connection object, and any parameters for the query, provided as a tuple. Here is an example:

from mydb import db

import pandas as pd

query = '''SELECT Alpha, Course
            FROM Grades
            WHERE Course=%s'''

values = ['SD321']

# Execute query and retrieve results
df = pd.read_sql_query(query, db.db, params=values)

# Print a few rows from result
print(df.head(4))

Problems

  1. Modify your mydb.py file from previous class and create a new method querydf that returns a dataframe as the results instead of a list of tuples.

  2. Build a Python script that uses your db.querydf method. Your script query the database to determine the Alpha, First, Last, Course, and CourseGrade for students with SATM scores below the average.

    The schemas for the tables needed:
    Midn(Alpha, Last, First, Birthdate, SATM, SATV, HighSchoolState)
    Grades(Alpha, AcYr, Sem, Course, Title, 6Week, 12Week, Exam, CourseGrade)