A new virus is ravaging the world. It arrived in the United States and data is pouring in, but it's too much data to analyze. We need someone who can help us process the data and put it into an easy-to-understand visualization! Is there a data scientist on this plane?
In this lab, you will process real COVID data from the CDC and create a state heatmap showing where COVID was reported on a daily basis. You will harness your skills with files, lists, loops, and dictionaries.
Download the CSV files population.csv and covid.csv containing COVID data from 1/22/2020 to 10/18/2022.
(this data was pulled from the CDC portal)
Your instructor will do this example with you.
We've been reading CSV files from scratch -- opening files -- reading lines -- splitting on commas. Now that you understand how all of that works, you get to use an actual CSV library today that does the hard work for you: csv.DictReader
from csv import DictReader
fh = open('population.csv')
myreader = DictReader(fh) # creates an object of type DictReader!
# Loop over the rows (each row is a dictionary now!)
for row in myreader:
print('The capital of', row['State'], 'is', row['Capital'])
Create a file example.py and copy this in. Run it.
Read this sample program and understand it. See what's happening? You still open the file, but you give the open file handle to a DictReader object. Instead of looping over lines, this DictReader lets you loop over dictionaries that it created from the lines. Each dictionary has the columns as its keys, and the values are ... the values. Make sure you understand how this code works. You can use DictReader in the next step!
Create a file population.py
This step is like a HW assignment. Write a short program to let the user query state populations. You downloaded the population.csv file above: a text file of state abbreviations and their populations. Your program output should EXACTLY match this:
State: MD # Remember, red is user input 6006401 State: PA 12802503 State: CA 39144818 State: exit Goodbye!
As you write your solution, you must include the following:
Start from this starter code:
from csv import DictReader
# Define the function here
if __name__ == '__main__':
# Create the state->population lookup dictionary.
pops = create_population_dict('population.csv')
# Write your loop for the user input
Define and use a function called create_population_dict(filename) which takes a string filename as an argument, and it returns a Dictionary. The Dictionary maps state abbreviations to population counts. The keys are the abbreviations ('AL') and the values are ints (19884342).
You should not open the population.csv file anywhere in your program except inside create_population_dict(filename)
Create a file daily.py
Write a program to let the user query COVID counts for one day, but normalize the daily count by the population size to get "cases per 100k". Your dictionary from the COVID counts file should map state abbrevitions to cases per 100k (e.g., "MD" -> 8.71). Retrieve the number of cases from the CSV column "new_cases".
We have some starter code for you:
from population import create_population_dict # <-- Your function from Step 1!
from datetime import datetime,timedelta
from csv import DictReader
# Define get_state_values
# You shouldn't need to change anything below this line.
if __name__ == '__main__':
# Create state to ID lookup dictionary.
populations = create_population_dict('population.csv')
# Query user for a desired day.
datestr = input("Desired date (mm/dd/yyyy): ")
date = datetime.strptime(datestr, '%m/%d/%Y') # See below for a reminder on datetime!
# Get the file into a state+value dictionary.
covids = get_state_values('covid.csv', populations, date)
print(covids)
Define the function called get_state_values(filename, population_dict, day) which takes a string filename as an argument, a population dictionary, and a datetime object. It returns a dictionary that maps state abbreviations ('MD') to COVID per 100k counts (9.7134).
Reminder! We used datetime in Lab 7. This creates datetime objects from your strings! You give the function your date as a string ("11/01/2021") and the format your string is in ("%m/%d/%Y"), and it creates a nice datetime object that you can use for comparisons like asking if one date is earlier than the other:
if mydate < otherdate:
You'll want to use that in your function when looking at each row of the CSV. You need to check if the row matches your date! Your output should look like this:
python3 daily.py Desired date (mm/dd/yyyy): 11/30/2021 {'CO': 48, 'MS': 18, 'WV': 42, 'OK': 14, 'NM': 53, 'SD': 156, 'TX': 22, 'AK': 24, 'IN': 61, 'IA': 88, 'AR': 35, 'ND': 90, 'UT': 39, 'HI': 5, 'LA': 12, 'VT': 32, 'KY': 61, 'NV': 31, 'NC': 24, 'CA': 11, 'MN': 230, 'RI': 73, 'ID': 38, 'IL': 44, 'NJ': 36, 'WY': 45, 'MD': 20, 'VA': 22, 'SC': 10, 'ME': 92, 'WI': 102, 'NE': 72, 'MO': 50, 'MT': 56, 'CT': 25, 'AZ': 43, 'OR': 25, 'NH': 71, 'DE': 28, 'MI': 92, 'FL': 10, 'TN': 32, 'KS': 0, 'WA': 19, 'PA': 45, 'MA': 53, 'GA': 14, 'OH': 58, 'NY': 26, 'AL': 14}
Copy your daily.py to covid.py. It's time to visualize the US map!
Pause and consider your output from Step 2. You filtered the big CSV file down to just a dictionary lookup of states to COVID counts. That should be all you need to visualize a map as long as you can find an easy-to-use Python mapping library! If you were on your own, you could search Google for "python map of usa" and see what comes up. That's what we're using here!
The plotly library of Choropleth Maps is an awesome library. It even has maps at the county level in the US (scroll down that link and see!). We will use the basic US States map. It's easy to use: put this in an empty file (call it test.py) and run it:
import plotly.express as px
from plotly.colors import sequential as colors
# Create the map object.
fig = px.choropleth(locationmode="USA-states", scope="usa", color_continuous_scale=colors.Oranges,
range_color=(0,150),
locations=["CA","TX","NY"], color=[66,128,94])
# Show the map in a web browser.
fig.show()
# Save the map as a file.
fig.write_html('covidmap.html')
Run it. See the 3 colored states? Now study the actual code you just ran. See why and how those 3 states were colored? There are two lists (states and values). How can we get these two lists from our dictionary?
Your Task: Modify your new covid.py to generate a map of the USA with its COVID case counts.
Your map looks pretty good, but not every state reports values every day, so you have some blanks. The counts for one day are also unreliable, and we'd prefer to show an average over multiple days. Change your covid.py from Step 3 so that instead of showing the one day, you show the average over the prior week.
This is not a big change to your program. You just need to change get_state_values so that instead of matching one day, you match the 6 prior days too, and sum all the values you see. I recommend when you add them up, you multiply each addition by 1/7. That's a quick and easy way to get the average without extra loops later.
How do you know which dates are 6 days before the desired date? Take advantage of the datetime object.
desired = datetime.strptime('10/03/2022', '%m/%d/%Y')
# 6 days in the past: will be 09/27/2022
past = desired - timedelta(days=6)
You can add/subtract with timedelta and the datetime library does the math for you.
Create a new program trends.py and generate map trends. Give the user flexibility to enter a date range that they'd like to view:
Desired Start: 10/01/2020
Desired End: 01/01/2021
Your program should then generate a series of maps (save as separate HTML files with date names), one for every 7 days between their date range. In this example above, your first map is the week average ending on 10/01/2020, followed by 10/08/2020, etc. Stop when you hit a week beyond their end date.
Your solution should create a new function generate_maps(filename,populations,start,end) that does all the work.
Your three programs (population.py, daily.py, and covid.py).
submit -c=sd211 -p=lab09 population.py daily.py covid.py