SQL Injections

In the last few weeks we discussed how to run database queries from Python or R. We'll discuss today about SQL Injections and methods to prevent them

SQL Injections, Python, and R

There are a few security concerns that you must consider when building SQL dynamically within your scripts. Let's look at one example: consider that you have the table created with the SQL below:

CREATE TABLE `Users` (
  `logonID` int NOT NULL AUTO_INCREMENT,
  `logon` varchar(45) DEFAULT NULL,
  `password` varchar(550) DEFAULT NULL,
  `accesslevel` int DEFAULT NULL,
  CONSTRAINT pk_access PRIMARY KEY (`logonID`));
With the following data:
INSERT INTO `Users` VALUES
    (1,'admin','d033e22ae348aeb5660fc2140aec35850c4da997',1),
       (261234,'m261234','89e495e7941cf9e40e6980d14a16bf023ccd4c91',5),
    (264433,'m264433','89e495e7941cf9e40e6980d14a16bf023ccd4c91',5);

Let's say that passwords can be verified via the statement below which will return the user's access level (a made up construct in our program example), their logonID, and logon name if the username/password combination is correct, or nothing if the username password combination is invalid. An attempt is considered valid (and access is granted) if any results are returned by the statement below.

SELECT logonID, logon, accesslevel FROM Users WHERE
  logon = 'someusername' AND password = SHA1('somepassword');

Sounds reasonable, right? Can you think of some possible input for either username or password that would result in maybe getting access to the system even if the username/password combination is not correct? Hint: use some special characters in your input.

Protecting against SQL Injections

As a programmer you have several options in your attempt to defend against SQL injections, not all of them successful:

Client side scripts can never be the entire solution, as the end user has the ability to change any of the code on their end and submit what they wish. Server side scripting is safer, but you may not account for all of the possible inputs that evil (or unknowing) users could provide. SQL Prepared Statements/Parametrized Queries are the best method to ensure that you query can not be subverted by bad input, and that is what we used in the course when sending the query with placeholders separately from the data to fill in the placeholders!

Problems

  1. Consider the following Python script:

    from mydb import db
    
    alpha = input("provide alpha:")
    
    # NOTE: THIS IS BAD CODE, DONT USE THIS IN FUTURE ASSIGNMENTS OR LABS.
    #        ALWAYS USE PREPARED STATMENTS.
    query = "SELECT Alpha, First, Last, Course, CourseGrade" +
    " FROM Midn JOIN Grades USING(Alpha)" +
    " WHERE Alpha = " + alpha
    
    cols, result,lastid , error, warning = db.query(query)
    
    for row in result:
        print(row)
    
    db.close()
    The script above is not using parametrized queries, but instead is just constructing a dynamic query by concatenating the user input with the query.
    • What would be a possible "bad" input a user could provide for the problem above, that would result in displaying the data for ALL students instead of the data for a specific student?
    • What would be a possible "bad" input that a user could provide for the problem above, that would result in changing the grade for SD321 for student 251234 to A? Hint: Python allows multiple queries in one command.
    • Change the Python code to prevent this type of problems
  2. Consider the following R script:

    library(RMariaDB)
    
    source("mydbR.R")
    db = dbConnect(RMariaDB::MariaDB(), user=user, password=password, dbname=database, host=host)
    
    alpha = readline("provide alpha:")
    
    # NOTE: THIS IS BAD CODE, DONT USE THIS IN FUTURE ASSIGNMENTS OR LABS.
    #        ALWAYS USE PREPARED STATMENTS.
    query = paste0("SELECT Alpha, First, Last, Course, CourseGrade ",
        "FROM Midn JOIN Grades USING(Alpha) ",
        "WHERE Alpha = ", alpha)
    
    res= dbSendQuery(db,query)
    data = dbFetch(res)
    dbClearResult(res)
    
    if (nrow(data) > 0){
      print("Success: received data")
      print(data)
    } else{
      print("Wrong alpha")
    }
    
    dbDisconnect(db)
    The script above is not using parametrized queries, but instead is just constructing a dynamic query by concatenating the user input with the query.
    • What would be a possible "bad" input a user could provide for the problem above, that would result in displaying the data for ALL students instead of the data for a specific student?
    • Change the script to prevent this type of problems.