Lab Due in 2 weeks -- Oct 4
Today's lab explores a dataset that contains most of the world's countries and data on the life expectancy of their populations across 6 decades. You will write a program that provides an easy interface for a user to compare specific countries to each other, and ultimately discover a trend in life expectancy that you may not have been aware of.
In today's lab, you will perform ... Data Processing, Data Visualization, Human-Data Interaction.
Create a new folder lab05 inside your SD211 folder for this lab. Create an initial Python program called life.py.
The life expectancy data comes as a CSV file again like last week.
Download: life-expectancies.csv
Open it up and take a look. You'll see the columns are pretty straightforward to understand:
Country Name,Country Code,Indicator Code,1960,1961,1962,1963,1964,1965,1966,1967,1968,1969,1970,1971,...,2020 ... United States,USA,SP.DYN.LE00.IN,69.7707317073171,70.2707317073171,70.119512195122,69.9170731707317,... ...
As in the prior lab, today uses the Plotly visualization library. Since you installed that last week, there is nothing new to do. However, if you need to re-install, you just need plotly and pandas:
conda activate sd211 conda install plotly pandas
Write your first program, life.py, so that it asks the user to input one country. Then your program should open the CSV file, loops over the lines, and graph the life expectancy of that one country from 1960-2021.
You know about CSV files, loops, and lists, so asking the user for a country and pulling out the numbers is in your wheelhouse.
Graphing a line is sort of new. We did a scatterplot in the last lab -- which is just a "line" of points without lines connecting them. Not surprisingly, you don't need to do much to graph a line except change the mode argument:
import plotly.graph_objects as go
fig = go.Figure()
fig.add_traces(go.Scatter(x=YEARS,y=VALUES,mode='lines+markers',name=nation))
fig.show()
Now write your program. Open the file, read each line, split it, find the country you want, and save the information you need in lists.
REQUIRED:
Your program output in the terminal should simply match this:
Country: United States
And then you should pop up a line graph of the country's life expectancy as shown in the image above.
Copy life.py to lives.py
Expand your program to allow for multiple countries to be graphed at the same time. The user will enter them separated by commas:
Country: United States,Canada,Mexico
You can assume the user always inputs commas with no spaces between them. Your graph for this input should look like the image shown to the right.
You can add multiple lines to your Plotly figure by just creating multiple Scatter plots and using add_traces(). Just add them all before you call show(). The following illustrates a figure with two lines:
fig = go.Figure()
fig.add_traces(go.Scatter(x=YEARS,y=VALUES,mode='lines+markers',name=nation))
fig.add_traces(go.Scatter(x=YEARS,y=MOREVALUES,mode='lines+markers',name=nation))
fig.show()
After you get multiple countries plotting correctly, add a loop so that your program continually asks for more plots until the user enters 'quit'. Each input from the user should create a new Figure and a new plot. The following example creates 3 different plots.
Country: United States,Canada,Mexico Country: China,Hong Kong Country: Australia,New Zealand,Philippines Country: quit
Continue editing lives.py
When communicating data analysis and results, you want attractive visuals. This might seem superficial and trite, but crispness and attractiveness in your visuals helps draw the reader (or superior officer) into the conversation and analysis. Boring, ugly graphs can subtly turn off whoever is looking at your output, and usually that's not what you want!
The Plotly library has a lot of settings that you can tweak in its go.Figure() objects. I'm going to show you some of them here, and you should adopt these as well as make any changes you see fit. At a minimum, you must incorporate these settings into your program in the correct place.
# Creates a settings variable with values that you can tweak
settings = dict(
showline=True,
showgrid=False,
showticklabels=True,
linecolor='rgb(204, 204, 204)',
linewidth=2,
ticks='outside',
tickfont=dict(family='Arial', size=12, color='rgb(82, 82, 82)')
)
Once you have this settings variable, you can then update your Figure with it:
fig = go.Figure()
fig.update_layout(xaxis=settings,yaxis=settings)
The above changes the axis details like its color, width, font, etc. But that's just the axes. You should also add titles to your graph! Those are different arguments to the update_layout() function, like this:
fig = go.Figure()
fig.update_layout(title='some graph title',
xaxis_title="The x-axis title",
yaxis_title="The y-axis title",
showlegend=True,
plot_bgcolor='white')
Here we add some titles, make sure the legend is on, and set the overall background color to white.
Don't just blindly paste these examples into your program. Thoughtfully place them, and perhaps combine them together! Make your graph look similar to the image in this section, at a minimum, or go farther and play with these yourself to make an even more interesting look!
Continue editing lives.py
When the user enters multiple countries, create one additional line which is the average of all the others. Plot that line along with the countries, and call it "AVERAGE". See the image here for what it should look like. Make no other changes to your program.
Test your program with these five African nations:
Country: Kenya,Ghana,South Africa,Botswana,Angola
Look at the overall trend. Perhaps focus on your AVERAGE line. What pattern do you see? Do you have an explanation for what happened? You may already know this, or you may not. Search the Web for historical events that caused this aberration in our dataset.
REQUIRED: Create a readme.txt file and write a paragraph that describes the data trend you observe, as well as a historic explanation for it. Include in your explanation why Botswana and South Africa appear to have similar trends, but shifted in time.
Your two programs (life.py and lives.py) as well as readme.txt answering Step 4's final analysis question.
Use the command-line to submit your files:
submit -c=sd211 -p=lab05 life.py lives.py readme.txt
...or if you're in the lab not your laptop, you can visit the submit website and upload the three files.