Today's lab will have you create a conversational agent that can lookup information in a user-driven conversation. Similar to Lab 1, you'll read text input from the user, but now that we've learned about loops and if statements, we can create something more interesting. We'll also auto-query the Web for answers to the user's data queries.
This lab will query Wikipedia for information. Install the wikipedia library:
conda activate sd211 pip3 install wikipedia
To help with the graphical interface, we have a custom chatbot library!
Save this file to your lab03 directory and rename it to easychat.py
Create a file called basic.py as your program. At the top of python programs you will typically import the libraries you need. Today your library is the easychat.py file you just downloaded, so you can import the "App" object from it like so:
from easychat import App
# This creates an object to hold your GUI in a variable called app
app = App("Name your Bot")
Run it and you should see a chat box open like the image on the right! Click X to close it.
How do we use this chatbox interface? You now have an app variable which contains your GUI app. It has its own print(str) and input(str) functions just like the terminal ones you are used to. Calling input(str) will wait for the user to type a reply to your string:
app = App("Name your Bot") # You should give it a name.
reply = app.input("How are you?")
The above will ask the user how they are in the chat screen, and return the string reply they type.
You now have a 3-line program basic.py from Part 0. Below are the requirements for Part 1. You'll need a number of string functions (the grey box below) to do much of this lab.
When the program starts, give a greeting and ask for the user's name. Tell them you're pleased to meet the user, printing their name back at them.
Next, your program should begin a sequence of printed messages and retrieved replies from the user that never ends until the user says "bye". What is this sequence? There are certain inputs you will act on (next numeric item), and others you could just keep printing "Tell me more", for instance. The program should end with a farewell like "Goodbye for now!" after the user enters "bye".
Required replies:
Important: string functions
In order to do the above, we need to know a few of Python's basic string functions! These first two are common string manipulation functions:
Lowercase a String: Just use the lowercase function that every string value carries around with it.
text = "Oranges are my favorite"
text = text.lower() # now the text value is lowercased
String Contains Test: Use Python's in operator. You ask if a string is in another string, and Python returns True/False:
text = "Oranges are my favorite"
if "oranges" in text.lower(): # see my .lower() call too?
print("I love oranges!")
String starts with another string: You want to know if a string starts with another string, use startswith(str):
s = "Hello my friend!"
if s.startswith("Hello"):
print('It started with Hello!')
String ends with another string: Same as startswith, but on the back side of the string:
s = "Hello my friend!"
if s.endswith("!"):
print('You sound excited!!')
Need more? Here is every string function available to you. You are free to use any of these as you see fit.
Copy basic.py to chatbot.py, and you will extend this 2nd program.
Now you will query a popular data storage website: Wikipedia. This part will take the first step of just answering questions by calling Wikipedia's API to get its text summaries that match the user's topic. If you've ever used Alexa or Google Home, you might notice that this is what they do too! They just parrot back the first sentence of wikipedia.
Your Task: answer any user questions that start with the word "what", "where", or "who" by querying Wikipedia, and printing out the first summary sentence of the matching Wikipedia page.
Your user asked a question as a string. You need to pull out what they are are asking about. String manipulation like this is a common need in data processing. You did some string comparison/checking above. Now let's do some string splicing and dicing. For this part, the user will ask you some questions, and you need to pull out the thing they're asking about:
"What is the White House?" --> "the White House"
"Where is New Zealand?" --> "New Zealand"
How to do this? You just need to pull off the first two words of the wh-question. Below are two string tools at your disposal (find, and substrings). Look at these examples, and it's up to you to put them together to make it happen:
str = "What is a navy seal?"
# returns the index of the first space char ' '
i = str.find(' ')
# returns the index of the first space char ' ' starting at the 10th character
i = str.find(' ', 10)
# grabs the substring from characters 3-7 (exclusive of 8)
substring = str[3:8]
# grabs the substring from character 3 all the way to the end of the string
substring = str[3:]
For your wh-questions, I suggest finding the second space character, and taking the substring from that location+1 to the end of their question! Once you get the substring you want, make sure you store it in a variable! A good test is to print it out, maybe parrot it back to the user with a print statement.
Now we're ready to query Wikipedia. In order for you to do this, we will use the Python Wikipedia library to do all the hard work of querying Wikipedia across our internet connection, and send us what we want. You already installed it above. You just need to import the library into your program. Add this line to the top of your program:
import wikipedia
Then query wikipedia for a sentence from a wiki page's summary paragraph!
summary = wikipedia.summary("Naval Academy", sentences=1)
This works because your import statement setup a variable called wikipedia that contains the summary() function.
PLOT TWIST: this function guesses at the wikipedia page you want with a "best" string match, but sometimes it guesses very wrong, failing and crashing ("Tom Hanks" -> "Tom Hands"). That's super annoying, so we will wrap the function call summary() to "catch" the error when it happens. We use this try/except syntax which we'll cover later in the semester. For now, just do this:
try:
summary = wikipedia.summary("Naval Academy", sentences=1)
except:
summary = wikipedia.summary("Naval Academy", sentences=1, auto_suggest=False)
You now know how to query Wikipedia. Your task for this part is to answer "what", "where", and "who" questions with wiki responses. Parse the question to get the search query, and then print the first sentence of the wiki page.
This part answers an information question directly...ages. For example, "How old is Tom Hanks?" ("63"). If you query wikipedia with "tom hanks", you see his birth date in the text, but not how old. We thus need to calculate his age based on the birth date.
The tricky part here is that the data you need is not formatted exactly how you want it. It's rare in any data science challenge that the data you want is formatted nicely from the start. Language is particularly difficult to deal with, so we'll need to do more string munging in this part. Here are examples of what you'll get back from people queries:
Donald Ervin Knuth ( kə-NOOTH; born January 10, 1938) is an American computer scientist, mathematician,...
Conan Christopher O'Brien (born April 18, 1963) is an American television host, comedian, writer, podcaster...
Barack Hussein Obama II ( (listen) bə-RAHK hoo-SAYN oh-BAH-mə; born August 4, 1961) is an American politician...
See the challenge? See the birth dates? They're in a pretty regular pattern. Parse the string to pull out the birth year, and then calculate how old they are. For this lab, I'll just require you to be correct within one year. You can ignore the month and day. Extra credit is available if you do full date processing to correctly get their age based on the month and day relative to the current date (lookup the python datetime library). Here is the interaction you must implement:
User: How old is Olivia Wilde?
Bot: Olivia Wilde is 35 years old.
HINT: the word "born" will almost always be there, so assume it is there, and find that first. Write code to find 'born', and then maybe jump to the closing parenthesis (strings have find() that searches for substrings! See above.).
Important: Your output must have their name with no extra punctuation. For instance, this is incorrect:
User: How old is Olivia Wilde?
Bot: Olivia Wilde? is 35 years old.
Bonus points if you substitute the name with the correct pronoun. Instead of "Olivia Wilde is 35 years old.", you output "She is 35 years old." Think about how you might figure that out for arbitrary men and women.
Answer population questions of the form, "What is the population of X?". If you query the wikipedia engine with the phrase "population of X", you actually get a page about the population! Try it out with 3-4 sentences instead of just 1. Here is the "population of canada":
Canada ranks 38th by population, comprising about 0.5% of the world's total, with over 37 million Canadians as of 2019. Despite being the fourth-largest country by land area (second-largest by totalarea), the vast majority of the country is sparsely inhabited, with most of its population south of the 55th parallel north and more than half of Canadians live in just two provinces: Ontario and Quebec.
You must implement this behavior:
User: What is the population of Canada?
Bot: The population is 37 million
The difficulty is that the population is hidden in the text. Try extracting it! Notice that the word "million" or "billion" will almost always be involved. Write code to find that, and then find spaces in front and behind it (strings have rfind() which is like find() but searches in reverse! Look at rfind's documentation for how to use it). Your code just has to work on the countries that match this type of pattern. They don't all follow it.
Submit a readme.txt that lists everything you implemented. For instance, "Completed all parts. For Part 3 I did the extra credit for exact ages."
Run your program and have a full conversation with it that illustrates all your functionality. Screenshot it and submit a .png called conversation.png
Use the command-line to submit your files:
submit -c=sd211 -p=lab03 basic.py chatbot.py readme.txt conversation.png
Login to our submit system and submit the following files:
basic.py
chatbot.py
readme.txt
conversation.png