Lab 12: Jupyter Notebooks

Today's lab starts with a fully functional jupyter notebook that analyzes Netflix's catalog and produces a bunch of graphs and visualizations. You will read through the notebook, understand its content, and then add a few of your own graphs based on what it is already doing.

Learning objectives: familiarity with notebooks, pandas review, visualization practice.

Data and Notebook

Download this zip file and unzip. You'll see a Jupyter Notebook and a CSV data file of netflix titles.

Step 0: Open and Install

Open the notebook in VS Code.

Change your Python (top-right of window now) to your SD211 conda.

Open a terminal (Terminal->New Terminal)

Install the ipython kernel and the seaborn visualization module:

conda activate sd211
conda install ipykernel seaborn

Step 1: Run and Analyze

Click the "Run All" button!

Read the notebook's markdown cells, scan the python code, and understand the graphs.

...hello. You're still reading this. Does that mean you read the notebook? Stop reading here and go read it! Open the notebook and understand what it is doing.

Step 2: Graph holiday movie months

Find where the notebook creates the bar graph of how many movies are added each month. Do you see a pattern? Maybe, maybe not.

Create a new code cell below the graph, and write code to create the same graph, but this time only for movies/shows that have one of the following words in its 'title' column:
Christmas|Jingle|Reindeer|Holiday|Grinch

You already see the notebook's code to make a graph. You now just need to filter the pandas dataframe for rows where the 'title' column contains one of those strings. Here is part of that check, where you use the str contains function:

df['title'].str.contains('Christmas|Jingle|Reindeer|Holiday|Grinch')

Don't remember Pandas? Either look at the notebook for examples, or review our Pandas notes.

Requirements:

Create a bar graph just like the notebook's existing one.
Change the title of the graph to "Number of Holiday Movies Added Each Month"
Show correct bars for only titles that contain one of our holiday words.
Create a Markdown Cell above your Code Cell solution, and include at least two types of markdown elements.

Step 3: Pie Chart by Country

Find the notebook's pie graph of types of content (movies vs tv shows). Create another pie graph below it, but instead show the country of origin of all shows.

This should be a really quick one. There is a 'first country' column in the netflix dataframe which has the main country where each show was released. Adapt the previous pie graph to do countries.

Requirements:

Create a pie chart just like the notebook's existing one.
Change the title of the graph to "Country Distribution of Shows"
Your pie should have the country names, and % of shows for each. US the biggest at 45.9%.
Create a Markdown Cell above your Code Cell solution, and include at least two types of markdown elements not used in the prior step.

Step 4: Bar Graph of Romantic Movie Ratings

Find the notebook's bar graph "Show Ratings across all of Netflix" showing the ratings (e.g., PG-13) of all its shows and movies. Create another bar graph of ratings, just like it, but only for shows in the 'Romantic Movies' category.

You need to poke around the notebook to figure out how to identify Romantic Movies in the dataframe.

Requirements:

Create a bar graph just like the notebook's existing one.
Change the title of the graph to "Romantic Show Ratings"
Your graph should show ratings for just 'Romantic Movies'.
Create a Markdown Cell above your Code Cell solution, and include at least two types of markdown elements not used in the prior two steps.

What to turn in

Visit the submit website and upload your edited notebook. Your notebook should Run All and have all outputs shown when saved.

README.txt: make a readme file that answers these questions:
1. What country is second to the US in Netflix content?
2. What two months have the most holiday movie releases?
3. What is the most popular rating (e.g., TV-PG) for shows that are Romantic Movies and how does that compare to all of Netflix?
4. What is the trend in show length (duration) of movies/shows since 2000? What does that suggest about your generation's attention span?

submit -c=sd211 -p=lab12 netflix_analysis.ipynb readme.txt