Today's lab starts with a fully functional jupyter notebook that analyzes Netflix's catalog and produces a bunch of graphs and visualizations. You will read through the notebook, understand its content, and then add a few of your own graphs based on what it is already doing.
Learning objectives: familiarity with notebooks, pandas review, visualization practice.
Download this zip file and unzip. You'll see a Jupyter Notebook and a CSV data file of netflix titles.
Open the notebook in VS Code.
Change your Python (top-right of window now) to your SD211 conda.
Open a terminal (Terminal->New Terminal)
Install the ipython kernel and the seaborn visualization module:
conda activate sd211 conda install ipykernel seaborn
Click the "Run All" button!
Read the notebook's markdown cells, scan the python code, and understand the graphs.
...hello. You're still reading this. Does that mean you read the notebook? Stop reading here and go read it! Open the notebook and understand what it is doing.
Find where the notebook creates the bar graph of how many movies are added each month. Do you see a pattern? Maybe, maybe not.
Create a new code cell below the graph, and write code to create the same graph, but this time only for movies/shows that have one of the following words in its 'title' column:
Christmas|Jingle|Reindeer|Holiday|Grinch
You already see the notebook's code to make a graph. You now just need to filter the pandas dataframe for rows where the 'title' column contains one of those strings. Here is part of that check, where you use the str contains function:
df['title'].str.contains('Christmas|Jingle|Reindeer|Holiday|Grinch')
Don't remember Pandas? Either look at the notebook for examples, or review our Pandas notes.
Requirements:
Find the notebook's pie graph of types of content (movies vs tv shows). Create another pie graph below it, but instead show the country of origin of all shows.
This should be a really quick one. There is a 'first country' column in the netflix dataframe which has the main country where each show was released. Adapt the previous pie graph to do countries.
Requirements:
Find the notebook's bar graph "Show Ratings across all of Netflix" showing the ratings (e.g., PG-13) of all its shows and movies. Create another bar graph of ratings, just like it, but only for shows in the 'Romantic Movies' category.
You need to poke around the notebook to figure out how to identify Romantic Movies in the dataframe.
Requirements:
Visit the submit website and upload your edited notebook. Your notebook should Run All and have all outputs shown when saved.
README.txt: make a readme file that answers these questions:
1. What country is second to the US in Netflix content?
2. What two months have the most holiday movie releases?
3. What is the most popular rating (e.g., TV-PG) for shows that are Romantic Movies and how does that compare to all of Netflix?
4. What is the trend in show length (duration) of movies/shows since 2000? What does that suggest about your generation's attention span?
submit -c=sd211 -p=lab12 netflix_analysis.ipynb readme.txt