Lab 8: Netflix Recommender

Today's lab writes two programs. It builds a basic Netflix query tool, and then a show/movie recommendation system based on the user's favorite shows. The data about Netflix shows comes in two different files, and the information is linked between them through unique identifiers. This is a common data challenge when you have information scattered across different sources. IDs link the pieces across the data files. We'll practice using Lists and Functions to accomplish our goals.

In today's lab, you will perform ... Data Processing.

Step 0: Folder and Files

Create a new folder lab08 inside your SD211 folder for this lab. Create an initial Python program called netflix.py.

Download two files: titles.csv and credits.csv

Open the files and make sure you understand how they're organized. Note that each show title has an ID which appears in the credits file with the actors. Think about it: how would I find all actors for a show like "Stranger Things"?

API: Utility Functions (50%)

Create a program called netflix.py and copy this starter code into it. This will be your Netflix function library. We wrote the initial function definitions for you. You now fill them in as described below. When finished, run the program and pass its tests.

find_show_id(string)

This function should take the name of a show as its argument ("Stranger Things") and return the unique identifier for that show ("ts38796"). You may assume the file titles.csv is in the current directory.

find_actor_id(string)

Similar to above, this function should take the name of an actor as its argument ("winona ryder") and return the unique identifier for that actor as a string ("5937"). You may assume the file credits.csv is in the current directory.

find_show_title(string)

This function should take the ID of a show as its argument ("ts38796") and return the string name for that show ("Stranger Things"). You may assume the file titles.csv is in the current directory.

find_actors(string)

This function should take the ID of a show as its argument ("ts38796") and return a List of strings. The returned list contains the names of all actors in the given show. You may assume the file credits.csv is in the current directory. Calling this function on "ts38796" (stranger things) should return:
['Winona Ryder', 'David Harbour', 'Millie Bobby Brown', 'Finn Wolfhard', ...]

find_shows(string)

This function takes a string actor ID as its argument. It returns all show/movie names in a List of strings that the given actor has played in. You may assume the file credits.csv is in the current directory. A return List for Adam Sandler ("2317") should return:
['Happy Gilmore', 'Grown Ups', 'I Now Pronounce You Chuck & Larry', 'Just Go with It', ...]

After writing these functions, run netflix.py and make sure you pass all the tests!

Program 1: A Netflix Lookup Program (75%)

Create a blank program lookup.py. Import your above library into it:

import netflix

Now write your lookup program to allow the user to ask about shows and actors, mimicking the interaction shown here, and using your functions where appropriate. You must use these functions. You will not receive full credit if you open files directly in this lookup.py program.

> python3 lookup.py
Welcome to the Netflix lookup!
Query? actors stranger things       <-- search "stranger things"
   Winona Ryder
   David Harbour
   Millie Bobby Brown
   Finn Wolfhard
   Gaten Matarazzo
   Caleb McLaughlin
   Noah Schnapp
   Sadie Sink
   Natalia Dyer
   Charlie Heaton
   Joe Keery
   Maya Hawke
   Brett Gelman
   Priah Ferguson
   Matthew Modine
   Paul Reiser
   Jamie Campbell Bower
Query? shows winona ryder
   "Girl                   <-- got cut off by the comma in the title; you can ignore
   Little Women
   Stranger Things
   Destination Wedding
   Sarah Cooper: Everything's Fine
Query? actors destination wedding
   Winona Ryder
   Keanu Reeves
   DJ Dallenbach
   Ted Dubost
   D. Rosh Wright
   Greg Lucey
   Curt Dubost
   Donna Lynn Jones
   Victor Levin
Query? done

Program 2: Netflix Recommendations (100%)

Start a blank program called recommend.py. Import your functions from netflix.py.

Your program will make show recommendations based on the user's favorite shows. Below is how your program should appear to the user, and we describe its details after that. The interaction should look just like this:

> python3 recommend.py
List some of your favorite movies and shows on Netflix...
Title: breaking bad
Title: ozark
Title: done
I recommend the following shows!
   Breaking Bad
   Ozark
   A Farewell to Ozark
   El Camino: A Breaking Bad Movie
   The Road to El Camino: Behind the Scenes of El Camino: A Breaking Bad Movie
   Better Call Saul
   Total Recall
   Time Share
   Thunder Force
   The Sweetest Thing

You will use your netflix API to help with this. Again, you will not receive full credit if you open a file directly in this program. Let me step you through the process, although perhaps you can pause and think about it yourself first!

  1. Your program first asks the user for a list of Netflix shows that they like. Loop until "done". You'll save them in a list. For instance:
    [ 'breaking bad', 'ozark' ]

  2. You'll then create one big list of all actors in each of their entered shows. It's ok to have repeat names in your list, for instance, if the same actor was in several of the user's favorites. No repeats in our example here, though:
    ['Bryan Cranston', 'Aaron Paul', 'Anna Gunn', 'Dean Norris', 'Jonathan Banks', 'Bob Odenkirk', 'Betsy Brandt', 'RJ Mitte', 'Jason Bateman', 'Laura Linney', 'Sofia Hublitz', 'Skylar Gaertner', 'Julia Garner', 'Charlie Tahan']

  3. Once you have these actors, you'll then lookup all shows for each actor in the list, creating yet another list of all shows for all actors. Again, repeats are ok! In fact, you want repeats so you can see which shows have more of their actors.
    ['Breaking Bad', 'Argo', 'Contagion', 'Total Recall', 'Kung Fu Panda 3', 'El Camino: A Breaking Bad Movie', 'Animal', 'Breaking Bad', 'BoJack Horseman', 'BoJack Horseman Christmas Special', 'El Camino: A Breaking Bad Movie', 'The Road to El Camino: Behind the Scenes of El Camino: A Breaking Bad Movie', 'Breaking Bad', 'Starship Troopers', 'Gattaca', 'Breaking Bad', 'How Do You Know', 'Linewatch', 'The Frozen Ground', 'Beirut', 'The Book of Henry', 'Girlboss', 'Scary Stories to Tell in the Dark', 'Breaking Bad', 'Better Call Saul', 'Term Life', 'Skylanders Academy', 'Mudbound', 'El Camino: A Breaking Bad Movie', 'The Road to El Camino: Behind the Scenes of El Camino: A Breaking Bad Movie', 'A Tale Dark & Grimm', 'Breaking Bad', 'Better Call Saul', 'W/ Bob & David', "Girlfriend's Day", 'Dolemite Is My Name', 'Breaking Bad', 'Straight Up', 'Breaking Bad', 'Time Share', 'Not a Game', 'Arrested Development', 'Hancock', 'The Sweetest Thing', 'Starsky & Hutch', 'The Gift', 'Ozark', 'Thunder Force', 'A Farewell to Ozark', 'Love Actually', 'Arthur Christmas', 'Nocturnal Animals', 'Ozark', 'Tales of the City', 'A Farewell to Ozark', 'Ozark', 'A Farewell to Ozark', 'Ozark', 'A Farewell to Ozark', 'Ozark', 'Inventing Anna', 'A Farewell to Ozark', 'I Am Legend', 'Blue Jasmine', 'Ozark', 'The Land of Steady Habits', 'Poms', 'A Farewell to Ozark']

  4. You now have a big list of shows! You want to sort these by relevance. Which are repeated? If a show appears many times, then the user will probably like it. We provided you a function in your netflix.py starter code called sort_list_by_frequency(list) that does this for you. It returns a new list of strings, sorted by which occurred the most:

    import netflix
    sortedlist = netflix.sort_list_by_frequency(yourlist)
  5. Print the first TEN shows in the returned sorted list. If there are less than ten, don't crash your program!

  6. (optional) Change that final Top 10 print loop so it doesn't print the user's original show inputs, but still prints 10 shows total. This is actually slightly tricky.

Your program should match EXACTLY the output example above, as well as work on other examples!

(optional, extra credit) Use Dictionaries

This lab intentionally avoided dictionaries so you would focus on functions and multi-file programs. You may have noticed that each function in our Netflix API is opening a file every time you call it. That's very inefficient. You really just want to open the files once, and save the information you need in memory. The functions then access what was previously read.

In this section, copy your netflix.py to netflix-dict.py and make changes to netflix-dict.py so that you only open the credits.csv and titles.csv files once. Read their data into dictionaries (actor names as keys, IDs as values, and vice versa). Change the functions so that they use the dictionaries instead of opening files.

What to turn in

Your three programs (netflix.py, lookup.py, and recommend.py).

submit -c=sd211 -p=lab08 netflix.py lookup.py recommend.py