SD321 (Fall 2024)

Atomic Transactions

A transaction, or logical unit of work (LUW), is a series of actions taken against the database that occur as an atomic unit, with all actions considered to be one large group occurring together. A transaction typically contains multiple SQL statements. With each atomic unit, there is either success or failure, with all transactions being COMMITTed or none being added (ABORT / ROLLBACK).

ACID

ACID refers to a set of properties that guarantee that database transactions are processed reliably. We expect that our databases are always in a consistent state and these properties will help ensure the stability of our data.

Atomic - Each transaction is expected to be all or nothing, if any part of the transaction fails, the entire transaction will fail and the state of the database will return to what it was before the transaction attempted to execute.
Consistent - With each transaction the database must move from one valid state to another. To be valid, all rules (constraints, cascades, triggers) must be enforced and executed successfully.
Isolated - The concurrent transactions act as if they were executed serially, one after another. DB changes by incomplete transactions may not be visible to other transactions until completed and committed to the DB.
Durable - Changes made by committed transactions are permanent, even if there are power losses, crashes, errors, etc.

Atomic

It is easier to view transactions via an example. Let's look at a transaction that has three distinct actions.

Add new order data to CUSTOMER table - Customer 123 just purchased 250 Basketballs at a cost of $6500.
Add new order data to SALESPERSON table - The salesperson is given a commission of 10% or $650.
Add new order data to ORDER table - Which has an SQL trigger associated with it to add the order information to the appropriate tables for processing and shipment (decrementing the ordered number from the available basketballs in stock)

But what would happen if for some reason part of this transaction failed? Let's say that this purchase is invalid as there are only 100 basketballs left and there were no plans to procure any more for this baseball focused store. We need to roll back the individual parts of the transaction if the final part of the online store system fails, otherwise two new problems emerge, the customer is charged for items that will not be delivered, and the salesperson is given a commission on a sale that is not real!

In the event that any part of the transaction fails, we ROLLBACK all parts of the transaction, removing any changes we made, and returning the database to the last valid state. If the entire transaction succeeds we COMMIT the changes to the database and it becomes the new state of the system. A DBMS that facilitates this type of COMMIT and ROLLBACK functionality allows us to achieve atomic transactions!

Consistency

Consistency ensures that the database moves from one valid state to a new valid state when changes are made via either statement level consistency or transaction level consistency.

Statement Level Consistency

With statement level consistency each statement independently processes the rows to ensure consistency, example:

UPDATE CUSTOMER
  SET AreaCode = '410'
  WHERE ZipCode = '21218';

All qualifying rows are updated and no concurrent updates to CUSTOMER are allowed during the update.

Transaction Level Consistency

With transaction level consistency all rows impacted by the SQL statements are protected from changes during the entire transaction. Note that with transaction level consistency a transaction may not see its own changes!

From the MySQL command line client we can start a transaction with START TRANSACTION;, and complete with either COMMIT; or ROLLBACK;, example:

START TRANSACTION;
UPDATE CUSTOMER
  SET AreaCode = '425'
  WHERE ZipCode = '21666'
# More transaction work...
UPDATE CUSTOMER
  SET Discount = 0.25
  WHERE AreaCode = '425'
COMMIT;

In your MySQL interactions with Python, the option to autocommit is by default disabled. In previous lectures we enabled it, but if you want to have control of when items are committed or rolled back, the autocommit needs to be off. Example:

import mysql.connector    # MySQL Connection Library
# Connect to MySQL
try:
  db = mysql.connector.connect(user=dbuser,
        password = dbpass,
        host = dbhost,
        database=dbname)
  db.autocommit = False # omit this line to leave default setting (off) or set autocommit to OFF
except mysql.connector.Error as err:
  print(err)
  exit() # assumes the program should terminate if we cannot connect

Now if we wished to commit, or rollback our work we would use either

db.commit()

to commit, or

db.rollback()

to remove our changes, which can be quite useful if we capture an error within our Python scripts that would impact our work with the database.

Note: Certain statements can not be rolled back. Data Definition Language (DDL) statements such as create, drop, or alter tables or the use of stored procedures may be automatically committed. Thus, a DB maintainer may have to manually undo the changes or revert to a previous backup of the database to undo the effects of these statements.

Isolated

Concurrent transactions are transactions that appear to users as if they are being processed at the same time. In reality, the database will only be executing one transaction at a time, similar to a CPU. However, steps from the two transactions will be interleaved. This could lead to concurrency problems where updates could be lost, or reads could become inconsistent. Imagine you have the scenario where in an online store one user is purchasing ten Snickers bars while a second is buying two Gatorade bottles.

Read nb Snickers (ns=500)
Reduce count Snickers by 10 (ns=490)
Write new nb Snickers back (ns=490)

Read nb Gatorades (ng=200)
Reduce count Gatorades by 2 (ng=198)
Write new nb Gatorades back (ng=198)

This may have been executed in the following order:

Read nb Snickers (ns=500)
Read nb Gatorades (ng=200)
Reduce count Snickers by 10 (ns=490)
Write new nb Snickers back (ns=490)
Reduce count Gatorades by 2 (ng=198)
Write new nb Gatorades back (ng=198)

Transactions that could cause significant issues

We may have a significant problem if both users purchase Snickers at the same time!

Read nb Snickers (ns=500)
Reduce count Snickers by 10 (ns=490)
Write new nb Snickers back (ns=490)

Read nb Snickers (ns2=500)
Reduce count Snickers by 2 (ns2=498)
Write new nb Snickers back (ns2=498)

The DBMS may execute the steps from these transactions in the following order:

Read nb Snickers (ns=500)
Read nb Snickers (ns2=500)
Reduce count Snickers by 10 (ns=490)
Write new nb Snickers back (ns=490)
Reduce count Snickers by 2 (ns2=498)
Write new nb Snickers back (ns2=498)

Visualize the steps of these transactions over time

U1: Read nb Snickers
  (ns=500)
U2: Read nb Snickers
  (ns2=500)
U1: Reduce count Snickers by 10
  (ns=490)
U1: Write new nb Snickers back
  (ns=490)
U2: Reduce count Snickers by 2
  (ns2=498)
U2: Write new nb Snickers back
  (ns2=498)

T1: R(Snickers)

T2: R(Snickers)

T1: W(Snickers)

T1: COMMIT

T2: W(Snickers)

T2: COMMIT

Or in a simpler form:

T1: R(S)        W(S)  Commit
T2:       R(S)                W(S)  Commit

Potential Problems

There are a few issues introduced when we encounter situations like the Snickers example. You must be able to distinguish between these problems. Focusing on what causes the problem is the most helpful way to distinguish between them!

Dirty Reads occur when uncommitted data from a first transaction (T1) is read by a second transaction (T2), and then T1's uncommitted data gets rolled back or aborted after T2 reads it! KEY: caused by uncommitted data being read! Note the dirty read by T2's R(A) step in this example:

T1: R(A), W(A),                     R(B), W(B), Abort(Undo Changes)
T2:             R(A), W(A)

Non-repeatable Reads occur when a transaction reads the same data twice during its steps and in between the reads another transaction updates or deletes a portion of that data. KEY: caused by an UPDATE or DELETE statement modifying data in between reads!

T1: R(A),                     R(A), W(A), Commit
T2:       R(A), W(A), Commit

Phantom Reads occur when during a transaction, two identical queries are executed (say as part of checks at the start and end of a stored procedure), and the rows returned by the second query are inconsistent from the first due to new rows being INSERTED and committed by another transaction. This is very similar to a non-repeatable read and can thus be confusing. KEY: Have to have an INSERT statement to have a Phantom Read!

The difference:
A non-repeatable read occurs when during the course of a transaction a row is retrieved twice and the values within the row differ between reads because another transaction UPDATED or DELETED the row.
A phantom read occurs when, in the course of a transaction, two identical queries are executed and the collection of rows returned by the second query is different from the first because another transaction INSERTED a row.

Solve the problem with Locking

Locking prevents multiple applications from obtaining copies of the same resource when the resource is about to be changed. Terminology:

Implicit Lock - Placed by the DBMS, automatically.
Explicit Lock - Issued by the application program, declared explicitly by an application programmer.
Lock Granularity - Size of the locked resource (rows, table, database level).
Exclusive Lock (X) - Prohibits other users from reading the locked resource.
Shared Lock (S) - Allows other user to read the locked resource, but they can't update it.

Example (with Snickers)

Lock Snickers
Read nb Snickers (ns=490)
Reduce count Snickers by 10 (ns=490)
Write new nb Snickers back (ns=490)
Release Lock

Lock Snickers (failed wait)




Lock Snickers
Read nb Snickers (ns2=490)
Reduce count Snickers by 2 (ns2=488)
Write new nb Snickers back (ns2=488)
Release Lock

Approaches to Locking

There are two primary general approaches to locking:

Optimistic Locking - Assumes that no transaction conflict will occur (preferred for most internet applications due to higher data throughput).
Pessimistic Locking - Assumes that conflict will occur (slower, but safer).

Optimistic - Pseudocode

SELECT PRODUCT.Name, PRODUCT.Quantity
  FROM PRODUCT
  WHERE PRODUCT.Name = 'Pencil';

Set NewQuantity=PRODUCT.Quantity-5;

{Process Transaction, Assume all is OK}

LOCK TABLES PRODUCT WRITE;

UPDATE PRODUCT
  SET PRODUCT.QUANTITY = NewQuantity
  WHERE PRODUCT.NAME = 'Pencil';

UNLOCK TABLES;

{Check to see if write was successful,
 that no other errors were produced
 like an invalid negative PRODUCT.QUANTITY,
 ROLLBACK changes and repeat if necessary}

Pessimistic - Psuedocode

LOCK TABLES PRODUCT WRITE;

SELECT PRODUCT.Name, PRODUCT.Quantity
  FROM PRODUCT
  WHERE PRODUCT.Name = 'Pencil';

Set NewQuantity=PRODUCT.Quantity-5;

{Process Transaction, Assume all is OK}

UPDATE PRODUCT
  SET PRODUCT.QUANTITY = NewQuantity
  WHERE PRODUCT.NAME = 'Pencil';

UNLOCK TABLES;

{No need to check to see if successful
as we executed our lock up front}

Locks and Usage

Most applications do not explicitly declare locks due to their complexity, instead they mark transaction boundaries and declare the locking behavior that they want the DBMS to use. With these markers (BEGIN or START TRANSACTION, COMMIT, and ROLLBACK), if the locking behavior needs to be changed, only the lock declaration passed to the DBMS needs to be changed but not the lock placement in the Persistent Stored Module in the DB. Here is the key table from the Kroenke textbook that you MUST understand to use the appropriate isolation level in MySQL. We want to choose the least restrictive level that ensures an atomic database! For example, if our stored procedure only has a single INSERT statement, NONE of our concurrency problems are possible, so use the least restrictive lock possible:

Let's look over an example in MySQL:

DELIMITER $$
CREATE PROCEDURE insertStudents (Alphavar char(6), LastNamevar varchar(50),
       FirstNamevar varchar(50), Emailvar varchar(100), ClassYearvar int,
     Majorvar char(4))
BEGIN
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
START TRANSACTION;
  IF ClassYearvar > YEAR(CURDATE()) THEN
    INSERT INTO Students(Alpha, LastName, FirstName, Email, ClassYear, Major)
  VALUES (Alphavar, LastNamevar, FirstNamevar, Emailvar, ClassYearvar, Major);
  END IF;
COMMIT;
END $$
DELIMITER ;

You can find more information on the isolation levels offered in MySQL in the documentation.

Oops... Deadlocks

It is possible to put the database into a position where nothing can happen, example:

Database management systems must resolve such deadlocks. First, they must be able to identify that a deadlock exists. Second, they must cancel one of the transactions, ROLLBACK its changes, and allow the other transaction to complete before starting the cancelled transaction over.

Practice with Transactions Activity

Please walk through the following SQL script and execute the commands line by line in order to get more familiar with Transactions in MySQL and selecting the proper isolation level (NOTE: recommend downloading and opening in your favorite text editor for syntax highlighting and readability or reading inline below):

Here is a link to a downloadable SQL file of the code below:Transaction Isolation Levels Example.

/* TRANSACTION ISOLATION LEVELS AND DIRTY READ, NON-REPEATABLE READ,
PHANTOM READ EXERCISE


-- Goal: Show the different database concurrency problems (Dirty Read,
Nonrepeatable read, Phantom Read) and how the Transaction Isolation Levels solve them in MySQL.

Key Point: We have to choose these wisely within our transactions, we want to be
as least restrictive as is safe! (Speed is KING! But an error filled DB is worthless!)

To work through this example you will need to pull two terminals up side by
I will refer to terminal 1 and terminal 2 throughout the notes to correspond
to two separate transactions on the same table.
*/

/* SETUP THE TABLES: */
DROP TABLE IF EXISTS CourseCounts;
DROP TABLE IF EXISTS StudentEnrollment;
DROP TABLE IF EXISTS Courses;
DROP TABLE IF EXISTS Students;

CREATE TABLE Courses(
      CourseNum VARCHAR(10),
      CourseName VARCHAR(25),
      CONSTRAINT PK_Courses PRIMARY KEY (CourseNum)
);

CREATE TABLE Students(
      Alpha INT,
      LName VARCHAR(45) NOT NULL,
      FName VARCHAR(45) NOT NULL,
      CONSTRAINT PK_Students PRIMARY KEY (Alpha)
);

CREATE TABLE StudentEnrollment(
      Alpha INT,
      CourseNum VARCHAR(10) NOT NULL,
      Semester VARCHAR (10) NOT NULL,
      Year INT NOT NULL,
      CONSTRAINT PK_StudentEnrollment PRIMARY KEY (Alpha, CourseNum, Semester, Year),
      CONSTRAINT FK_StudentEnrollment_Students FOREIGN KEY (Alpha)
        REFERENCES Students (Alpha) ON DELETE CASCADE ON UPDATE CASCADE,
      CONSTRAINT FK_StudentEnrollment_Courses FOREIGN KEY (CourseNum)
        REFERENCES Courses (CourseNum) ON DELETE CASCADE ON UPDATE CASCADE
);

CREATE TABLE CourseCounts(
      CourseNum VARCHAR(10),
      Semester VARCHAR(10),
      Year INT,
      CourseCountCur INT NOT NULL DEFAULT 0,
      CourseMax INT NOT NULL DEFAULT 20,
      CONSTRAINT PK_CourseCounts PRIMARY KEY (CourseNum, Semester, Year),
      CONSTRAINT FK_CourseCounts_Courses FOREIGN KEY (CourseNum)
        REFERENCES Courses(CourseNum) ON DELETE CASCADE ON UPDATE CASCADE
);

/* INSERT AT LEAST SOME STUDENTS */
INSERT INTO Students VALUES (255106, 'Smith', 'Paul');
INSERT INTO Students VALUES (255107, 'Duncan', 'Gavin');

DELETE FROM StudentEnrollment;
DELETE FROM Courses;
DELETE FROM CourseCounts;
INSERT INTO Courses VALUES ('NL110', 'Intro to Leadership');

/************************* MYSQL DEFAULT ISOLATION ****************************/

/* Step 1: See the MYSQL default transaction level: */
SELECT @@transaction_ISOLATION;

/***************************** READ UNCOMMITTED *******************************/

-- T1 is a stored procedure to remove a student from a course
-- T2 is a stored procedure to add a student to a course

-- SETUP: Establish a full course for NL110
INSERT INTO CourseCounts VALUES
  ("NL110", "FALL", 2025, 20, 20)
  ON DUPLICATE KEY UPDATE CourseCountCur = 20;

-- Terminal 1:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
START TRANSACTION;

-- T1: R(CourseCounts) OUR SP TRANSACTION TO REMOVE A STUDENT FROM A COURSE
SELECT * FROM CourseCounts;

-- T1: W(CourseCounts) to remove student
UPDATE CourseCounts
  SET CourseCountCur = CourseCountCur - 1
  WHERE CourseNum = "NL110";

-- TERMINAL 2:
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
START TRANSACTION;

-- T2: R(CourseCounts) OUR SP TRANSACTION TO ADD A STUDENT TO A COURSE
-- THIS IS OUR DIRTY READ!
-- T1's W(CourseCounts) Can be ROLLEDBACK in T1 after our logic check.

SELECT * FROM CourseCounts;

-- TERMINAL 1:
ROLLBACK;

-- TERMINAL 2:
ROLLBACK; -- OR COMMIT; -- same effect since no changes were made in T2

-- We have to be able to prevent cross transaction reads of non-committed changes!


/************************** READ COMMITTED ****************************/

-- --------------SHOW DIRTY READ IS NO LONGER POSSIBLE-------------------------
UPDATE CourseCounts
  SET CourseCountCur = 20
  WHERE CourseNum="NL110";

-- TERMINAL 1:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;

-- T1: R(CourseCounts) OUR SP TRANSACTION TO REMOVE A STUDENT FROM A COURSE
SELECT * FROM CourseCounts;

-- T1: W(CourseCounts) to remove student
UPDATE CourseCounts
  SET CourseCountCur = CourseCountCur - 1
  WHERE CourseNum = "NL110";

-- T1: R(CourseCounts) that changes visible within transaction
SELECT * FROM CourseCounts;

-- TERMINAL 2:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;

-- T2: R(CourseCounts)
SELECT * FROM CourseCounts;


/*
NOTE: Key Point, DIRTY READ is solved, we don't see the uncommitted changes!

DIRTY READ = when **uncommitted data** is allowed to be read from another running transaction.

*/

-- TERMINAL 1 
COMMIT;  -- just to finish the transaction
-- TERMINAL 2
COMMIT;

-- --------------SHOW NON-REPEATABLE READ STILL POSSIBLE  ---------------------

-- SETUP:
UPDATE CourseCounts
  SET CourseCountCur = 19
  WHERE CourseNum="NL110";

-- TERMINAL 1:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;

-- T1: R(CourseCounts)
SELECT (CourseMax - CourseCountCur)
  FROM CourseCounts
  WHERE CourseNum = "NL110";

-- TERMINAL 2:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;

-- T2: R(CourseCounts)
SELECT (CourseMax - CourseCountCur)
  FROM CourseCounts
  WHERE CourseNum = "NL110";

-- T2: W(CourseCounts)
UPDATE CourseCounts
  SET CourseCountCur = CourseCountCur + 1
  WHERE CourseNum = "NL110";

COMMIT;

-- TERMINAL 1:
-- T1: R(CourseCounts)
SELECT (CourseMax - CourseCountCur)
  FROM CourseCounts
  WHERE CourseNum = "NL110";

ROLLBACK;

/* NOTE:

KEY POINT: This change in available course seats in the middle of our
T1 transaction (and after our first check) can cause errors in our DB, even though
all changes where legitimate!

NON-REPEATABLE READ = transaction reads same data twice and in between another
transaction has **updated** or **deleted** values!

*/


-- --------------SHOW PHANTOM READ STILL POSSIBLE  -------------------------

-- REITERATE, WE ARE STILL USING READ COMMITTED LEVEL IN BOTH TERMINALS:

-- TERMINAL 1:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;

-- T1: R(Courses) Checking for how many courses we offer, only have enough to offer
-- three courses per semester.
SELECT COUNT(*) FROM Courses;

-- T1: W(Courses)
INSERT INTO Courses VALUES ('IC210', 'Intro to Computing');

-- TERMINAL 2:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
START TRANSACTION;

-- T2: W(Courses)
INSERT INTO Courses VALUES ('SY110', 'Intro to Cyber Security');
INSERT INTO Courses VALUES ('SD211', 'Data Science and Programming I');
COMMIT;

--TERMINAL 1:
-- T1: R(Courses) Checking courses again, new value?!
SELECT COUNT(*) FROM Courses;

-- We better rollback, we don't have enough instructors!
ROLLBACK;

-- SHOW THIS IS DIFFERENT THAN OUR NON-REPEATABLE READ, THIS IS OUR PHANTOM READ!
-- Rather than updated or deleted data (NON-REPEATABLE READ) we have new **INSERTED
-- ROWS**, this is what makes it a PHANTOM READ vice a NON-REPEATABLE READ!

/* NOTE:
  IN MOST DBMS (ORACLE, MICROSOFT SQL), PHANTOM READ isn't solved until the SERIALIZABLE setting.
  in MYSQL, because of implementation differences, PHANTOM READ gets solved
  a step early. See here if you are curious:

  https://stackoverflow.com/a/8302879/5960479
*/

/* STUDENT TODO: repeat above steps using REPEATABLE READ settings to see they aren't possible */

Problems

the questions are related to possible operations in a MIDS-like system. The following tables are involved:
Students(Alpha, LName, FName)
Courses(CourseNum, CourseName)
CourseCounts(CourseNum, Semester, Year,CourseCountCur, CourseMax) StudentEnrollment(Alpha, CourseNum, Semester, Year)

Assume that the following two types of problems can run concurrently in your application and you could also have multiple calls to the same program running concurrently:
spReadCourseCounts: read the rows in the CourseCounts table for courses that will be offered during a particular semester. The procedure uses only the CourseCounts table to read some courses
spPreregisterStudents: records and changes preregistration information. The procedure writes to StudentEnrollment table, modifies the CourseCounts table, and reads from Students, Courses, CourseCounts table, possibly multiple times.

Using the R and W notation used above, write the steps for each of the above procedures.
Give, in plain English, and then by using R (for read), and W (for write) notation that we used above, an example of a schedule for this group of SQL Statements that would lead to a dirty read, or explain why a dirty read cannot happen in this group of SQL Statements.
Give, in plain English, and then by using R (for read), and W (for write) notation that we used above, an example of a schedule for this group of SQL Statements that would lead to an unrepeatable read, or explain why anunrepeatable read cannot happen in this group of SQL Statements.
Give, in plain English, and then by using R (for read), and W (for write) notation that we used above, an example of a schedule for this group of SQL Statements that would lead to a phantom read, or explain why a phantom read cannot happen in this group of SQL Statements.
What transaction isolation level would you use for each of the two procedures above, and why? For each procedure you should use the least restricted transaction isolation level that ensures correctness.