The next lab will use a library called NLTK. Today we will discuss a little more detail about what's involved when you use ANY library, as well as the tools NLTK brings us. We'll use this as a running example to go deeper into lists that contain other things beyond primitives. You can find several other simple examples on this tutorial webpage.
conda install nltk
One thing NLTK does for you is to split up English text into "tokens", words and punctuation. It also has English processors like a Part of Speech (POS) tagger that comes in handy. Below is an example program that tags all the words with their POS and pulls out just the nouns.
You can run this one from here directly:
import nltk # of course
nltk.download('punkt')
nltk.download('averaged_perceptron_tagger') # machine learned model you download once
def extract_nouns(sentence):
""" Given a string, this returns a List of strings as the sentence's nouns """
# Try printing each of these out to see what they contain.
tokens = nltk.word_tokenize(sentence)
tagged = nltk.pos_tag(tokens)
# Get the nouns!
nouns = list()
for word,tag in tagged:
if tag.startswith('NN'): # If the part of speech is an NN, it's a noun
nouns.append(word)
return nouns
sent = input("Sentence? ") # John left the store after buying some peaches.
nouns = extract_nouns(sent)
print("Your nouns are: ", nouns)