SQLite is a self-contained database system with no external dependences, no setup or administration requirements, which allows for a complete database environment contained within a single disk file or in memory. SQLite implements most of SQL92 with transactions also achieving ACID (atomic, consistent, isolated, and durable) compatibility. SQLite is great for use in mobile applications, web applications, and fast data processing and analysis. This lesson will introduce you to SQLite, how to use it, and how you can integrate it with Python or R to perform fast data analysis.
In Linux, sqlite3 is already installed. To create/connect to a database (file):
sqlite3 filename
or to connect to an in memory version:
sqlite3
SQLite version 3.39.3 2022-09-05 11:02:23
Enter ".help" for usage hints.
sqlite>
Most SQL commands (SELECT, INSERT, UPDATE, DELETE, CREATE) work as expected. Other commands that you would normally use to find the status of the database are not available. Below highlights a few of the more important SQLite-specific commands:
From the documentation, each value stored in an SQLite database (or manipulated by the database engine) has one of the following storage classes:
Since the types are limited, let's review a CREATE TABLE example:
CREATE TABLE grades (
alpha INTEGER NOT NULL,
AcYr TEXT,
Sem TEXT,
BlkNbr INT,
Course TEXT,
Title TEXT,
Section TEXT,
s6Week TEXT,
s12Week TEXT,
EndOfTerm TEXT,
Exam TEXT,
CourseGrade TEXT,
Repeated TEXT,
PRIMARY KEY (alpha, AcYr, Sem, BlkNbr, Course)
);
Here's an SQL file you can use to build a small grades database.
SQLite integrates into almost any programming language you may want to use. It is also quick and easy to use right from the command line. We are going to use SQLite to perform data analysis on data from Bob Ross's TV show. You can download it here.
A lot of data in the world is saved in a csv format. SQLite makes it easy to quickly load csv formatted data into a table. The first row of the csv file will automatically become the column names, so you should examine the csv before loading. Start SQLite3 from the command line then enter csv mode and load your csv file:
sqlite3
sqlite> .mode csv
sqlite> .import elements-by-episode.csv bobRossData
or:
sqlite> .import --csv elements-by-episode.csv bobRossData
You can see all the possible arguments for the command by typing .import.
sqlite> .tables
sqlite> .schema bobRossData
sqlite> .save bobRossDB.db
The .save command will save all the tables the in current database (only one for
now) to a file called bobRossDB.db. You can reload that database by passing the filename as an argument to the sqlite3 command on the command line.
$ sqlite3 bobRossDB.db
Now we can work with the data. Let's find out how many episodes Bob Ross painted trees in. There is a TREE column in the table that has value 1 if a tree was painted in the episode and 0 if not.
sqlite> SELECT SUM(TREE) FROM bobRossData;
361
Great! Now in just a few minutes we have been able to use SQL to start extracting information from our data!
Let's find out how many episodes had the word 'SNOW' in the title:
sqlite> SELECT TITLE FROM bobRossData WHERE TITLE LIKE "%SNOW%";
"SNOWFALL"
"SNOW TRAIL"
"SNOWY SOLITUDE"
"CHRISTMAS EVE SNOW"
"BARN IN SNOW OVAL"
"SNOWFALL MAGIC"
"SNOWY MORN"
"SNOWBOUND CABIN"
"FIRST SNOW"
"SNOW BIRCH"
"SPLENDOR OF A SNOWY WHITE"
"BEFORE THE SNOWFALL"
Sometimes we want to save the results of our queries. SQLite lets us easily save query outputs to a csv format. First make sure that we are still in csv mode. You can use
.show to see the
values for mode and other settings.
sqlite> .mode csv
sqlite> .output query_results.csv
sqlite> SELECT EPISODE, TITLE FROM bobRossData WHERE TITLE LIKE "%SNOW%";
sqlite> .output stdout
First, we tell SQLite to be in csv mode, send output to a csv file named 'query_results.csv'. Then we run our query, note that we will not see any results on the screen. Then change the output mode back to the screen or 'stdout'.
SQLite is easy to use from Python. The code below shows the steps to connect
to a data file and make queries. Note the strong similarities with what we
learned about connecting Python with MySQL.
import sqlite3
#Provide the path to your database file, this file is in the same folder as this Python code
db_connect = sqlite3.connect("bobRossDB.db")
# Create a cursor
cursor = db_connect.cursor()
# Use the cursor object to perform queries, inserts, updates, or other operations
my_query = "SELECT TITLE FROM bobRossData WHERE TITLE LIKE \"%SNOW%\""
query_results = cursor.execute(my_query).fetchall()
Great! We can now make queries to the database from Python and from the
command line. But what if we want to take user input to use in the query or
make many queries with a list of values? When using MySQL with Python, we
used %s as the placeholder. For sqlite, we;ll use ? as the placeholder
(like in R)
# like in R, the value we want to replace is represented with a ?
my_query = "SELECT TITLE FROM bobRossData WHERE TITLE LIKE ?"
query_results = cursor.execute(my_query, (my_data, )).fetchall()
You can also get results via .fetchone() or .fetchmany(x) where x is the number of results that you want.
If we want to INSERT or UPDATE data we then have to commit those changes to the database with the .commit() command.
import sqlite3
try:
db_connect = sqlite3.connect("BobRossDB.db")
cursor = db_connect.cursor()
insert _query = “INSERT INTO bobRossData VALUES (‘New Episode’ …)”
cursor.execute(insert_query)
# commit data to file.
db_connect.commit()
except sqlite3.Error as error:
print("Table Insert failed on "+ str(error))
A nice tutorial for using SQLite from R is available at RSQLite.
To use SQLite from R we need to install the RSQLite package.
install.packages("RSQLite")
However, even if we'll use the RSQLite package, the recommendation is to load
the DBI package in a script, since RSQLite implements the DBI
interface.
library(DBI)
To connect to the database, we use the dbConnect method we are familiar with, but this time with an object of class RSQLite as first parameter. Since SQLite is a single-user database, no username and password are required.
# connect to a given database file
mydb = dbConnect(RSQLite::SQLite(), "bobRossDB.db")
# OR connect to a temporary on-disk empty database that will be deleted at disconnect
mydb = dbConnect(RSQLite::SQLite(), "")
# OR connect to a temporary in-memory empty database
mydb = dbConnect(RSQLite::SQLite(), ":memory:")
Once connected, the same methods that work with RMariaDB/RMySQL work with
RSQLite as well.
To disconnect from the database:
dbDisconnect(mydb)
To load some data from a dataframe into a new table in the database with the same name:
dbWriteTable(mydb, "mtcars", mtcars)
To list all tables in the database:
dbListTables(mydb)
To send queries and retrieve results:
my_query = "SELECT TITLE FROM bobRossData WHERE TITLE LIKE ?"
data_query = c(searchTerm)
# Option 1: send query, fetch results, clear results
res = dbSendQuery(mydb, my_query, data_query)
rows = dbFetch(res)
# Option 2: use dbGetQuery which sends the query, revtrieves al results, and
clears the results
rows = dbGetQuery(mydb, my_query, data_query)
Now that we have our data in a database that we can use, one task we may want to accomplish is data visualization. You already used Matplotlib, a great Python library for building plot, graphs, charts, etc. You can see some awesome examples on their website. Matplotlib can easily take in data in a csv format and generate plots and charts. Here is a simple Python program that plots rows read from a database:
import sqlite3
import matplotlib.pyplot as plt
# Connect to the database
db_connect = sqlite3.connect("myDatabase.db")
cursor = db_connect.cursor()
# Perform a query to retrieve the data you want to analyze
query = "SELECT column_name1, column_name2 FROM your_table_name"
cursor.execute(query)
# Fetch all the rows as a list of tuples
data = cursor.fetchall()
# Unpack the tuples into separate lists for each column
column1_data = [row[0] for row in data]
column2_data = [row[1] for row in data]
# Create a bar chart using the data from the two columns
plt.bar(column1_data, column2_data)
# Add labels and a title for the chart
plt.xlabel("Column 1 Name")
plt.ylabel("Column 2 Name")
plt.title("Bar Chart of Column 1 vs Column 2")
# Display the chart
plt.show()
# Or, to save the figure to a file
plt.savefig('output.png')