We saw last time how to use the MySQLdb to connect to a database, run queries, and retrieve the results. Today we will discusst how to store the results of a SELECT query into a pandas data frame for future processing.
Let's assume you have a database connection stored in the db
variable and a cursor stored in the cursor variable, and you just
executed a SELECT query and fetched all the results. You can construct a pandas data frame from the result
fetched from the cursor by creating a DataFrame object and passing the fetched
result to the constructor. If you want to give names to the columns in the data
frame, corresponding to the names returned from the SELECT statement, we can
retrieve that information from cursor.description and passed that to
the DataFrame constructor as well.
from mydb import db # This is our new mydb.py with the connection class
import pandas as pd
query = """
SELECT Alpha, Course
FROM Grades
WHERE Course=%s
"""
values = ['SD321']
cursor = db.db.cursor() # Directly access the MySQLdb object within our connection class
cursor.execute(query, values) # Execute the query
results = cursor.fetchall() # Retrieve the results
columns_desc = cursor.description # Retrieve information about columns
# Get the column names (first element of each tuple in columns_desc
columns_name = [t[0] for t in columns_desc]
# Create a data frame from results
df = pd.DataFrame(results, columns = columns_name)
# Print a few rows from result
print(df.head(4))
Pandas has built-in support to read from a database using read_from_sql method. However, pandas does not officially support MySQLdb connections, so even if using MySQLdb is currently possible, support might change in the future. A warning is issued when using MySQLdb connection with pandas.read_sql_query.
In order to use pandas.read_sql_query, we need the actual SELECT query, the database connection object, and any parameters for the query, provided as a tuple. Here is an example:
from mydb import db
import pandas as pd
query = '''SELECT Alpha, Course
FROM Grades
WHERE Course=%s'''
values = ['SD321']
# Execute query and retrieve results
df = pd.read_sql_query(query, db.db, params=values)
# Print a few rows from result
print(df.head(4))
querydf
that returns a dataframe as the results instead of a list of tuples.
db.querydf method.
Your script query the database to determine the
Alpha, First, Last, Course, and CourseGrade for students with SATM scores below the average.