Integrating MySQL with R

R offers support for MySQL via multiple packages. In this course we will be using the RMariaDB package to connect from R to a MySQL database server and run queries. MariaDB is MySQL are related, and the MariaDB driver is the one currently used for connections to both MySQL and MariaDB. For more information on RMariaDB see the official documentation.

Installing the package

We will use the RMariaDB package to connect to our MySQL server from R. Before using it, you will need to install the package in your R environment. From the R command line or from R Studio, execute the following line:

# Lets install the required package for R
# We only need to do this once.
install.packages("RMariaDB")

Connecting to the database

The code to connect to the database is similar with what we used from Python, but we need to specify as first parameter an object of class MariaDBDriver:

# Lets connect to the database
library(RMariaDB)
db = dbConnect(RMariaDB::MariaDB(),
               user='username',
               password = 'dbpass',
               host = 'dbhost',
               dbname = 'databasename')  
To test that connection was successful, you could execute a command to list all tables:

dbListTables(db) 
To disconnect, when all queries are completed, usually at the end of your R script:

dbDisconnect(db)
In practice, this method is not what is normally used directly. Please review the next section!

Connecting (in practice)

Since your username and password (for the database) is needed for the connection, we'll create a small mydb.R file which stores the parameters for the connection. This file should normally be stored outside of the R working directory, in a place where other configurations settings are. For now, we will store it in the same dorectory as our other R scripts.

# mydb.R - This file stores the configuration for the database connection.

user     = "m2xxxxx"
password = "YOURPASSWORD"
host     = "db.cs.usna.edu"
database = "m2xxxxx"

# Import the Library
library(RMariaDB)

# Connect to the database
db = dbConnect(RMariaDB::MariaDB(),
      user     = user,
      password = password,
      host     = host,
      dbname   = database )
Then, we'll connect to the database using the configuration file instead of specifying the username/password explicitly in the script:

# Source our mydb file
source("mydb.R")

Performing a basic query

There are a few basic steps needed when performing SQL queries via R and MySQL. In the examples below simple queries will be presented on the left, while dynamic queries are shown on the right.

  1. Create a variable that will store the SQL query. To support dynamic queries, place a ? in spaces that will be filled in by variables later.
  2. Simple Query
    query = "SELECT Alpha, Course
               FROM Grades
              WHERE Course = 'NL110'"
    query2 = "UPDATE Grades
                 SET 6Week = 'A'
               WHERE Course = 'NL110'"
    Dynamic Query
    query = "SELECT Alpha, Course
               FROM Grades
              WHERE Course = ?"      
    query2 = "UPDATE Grades
                 SET 6week = ?
               WHERE Course = ?"      
  3. Provide any variables needed for the dynamic query. We need to provide the variables as a vector with all the values needed:

    # This step not required for simple query!
    course = 'NL110' # Just as an example
    values = c(course)
    values2 = c('A',course)
  4. Execute the query. Send the query and, if needed, the data, to the database, using the database connection. For queries the return data, for example SELECT, we should use the dbSendQuery(), while for other queries, e.g INSERT, UPDATE, DELETE, we should use dbSendStatement()

  5. res = dbSendQuery(db, query)
    res = dbSendStatement(db, query2)
    res = dbSendQuery(db, query, values)    
    res = dbSendStatement(db, query2, values2)    
  6. Optionally retrieve any basic / interesting metadata. Use these options when needed.

  7. row_count = dbGetRowsAffected(res) # Number of rows affected by last insert, delete, or update
    columns_info = dbColumnInfo(res)   # Description of columns selected
    statement = dbGetStatement(res)    # Last SQL statement executed -useful for debugging  
  8. For SELECT queries, retrieve the results. There are multiple ways to retrieve the results of the query, depending on the expected size of the result:
    • you can retrieve a few rows, for example 5, at a time using data = dbFetch(res, n=5)(),
    • or retrieve all of them at once using data = dbFetch(res) (this can be a problem if the result set is very large and does not fit in memory).

    Note that all rows must be retrieved from a result set before another query is sent to the database, otherwise the results are lost.

    # Option 1: Retrieving one row at a time with a loop
    row = dbFetch(res,n=1)         # Fetch a single row
    print(row)                     # Do some work
    while (!dbHasCompleted(res)){
      row = db.Fetch(res,n=1)      # Retrieve the next row
      print(row)                   # Do some work
    }
    # Option 2: Retrieve all of the rows at once.
    allresults = dbFetch(res)
    print(allresults)
  9. Clear the results before executing other queries.

  10. dbClearResult(res)  
  11. ALTERNATIVE: If results fit in memory, instead of sending the query, fetching the result, and clearing the result, we can use dbGetQuery() for SELECT queries, and dbExecute() for INSERT, UDPATE, DELETE queries, and these methods have the send-fetch-clear steps build-in.

  12. res  = dbGetQuery(db, query, values)  # For SELECT
    res2 = dbExecute(db, query2, values2) # For INSERT, UPDATE, DELETE

Problems

  1. Create your mydb.R file based on the "Connecting (in practice)"
  2. Using your mydb.R
    • List all tables in your database
    • Display all the rows in one of your tables
    • Display the Alpha, First, Last, Course, and CourseGrade for students in Midn and Grades tables.
    • Disconnect from the database