SI485i, Fall 2012

Lab 5: Ease into Probabilistic Syntax

Due date: the start of class, Oct 18

Motivation

Before we build full PCFG parsers, this lab will introduce you to some important aspects of English syntax, and you will build basic probabilities over a few rules that we will loosely define. Next week, you will work with an actual parser and learn real PCFGs.

Verb Tense and Aspect

Verbs have many forms. You will create VP grammar rules to cover many of them, and then estimate their probabilities by hand. Your job is to focus on the verb leave. You must write grammar rules (e.g., VP -> VBG NP) for all of the following tenses and aspects. Each should have its own unique VP->X X rule! You will be graded on how accurately your rules capture the forms, and how they do not allow other forms to match them. IMPORTANT: assume the verb leave has one noun argument, so your rules should have the appropriate NP with it!

  1. Present tense: "leave", "leaves"
  2. Present perfect: "has left", "have left"
  3. Present progressive: "is leaving", "are leaving", "am leaving"
  4. Past tense: "left"
  5. Past perfect: "had left"
  6. Past progressive: "was leaving", "were leaving"
  7. Future: "will leave"

The second part of this section is to estimate the probabilities P(VP->VBG NP) of each rule, using our same twitter dataset from the last lab. You'll want to write some Java code to search the strings, and output the count of each. Note that the future and present forms both contain "leave", so take care not to double count across tenses. Remember that P(VP->VBG NP) = P(VBG NP | VP). Your probabilities should of course sum to one!

After you finish the verb leave, pick some other verb that occurs many times, and compute the probabilities just based on that verb. The VP rules should be unchanged from leave if you did this correctly (except to substitute your new verb in, of course)!

Wh-Question Syntax

Asking questions in English is somewhat straightforward. There are relatively well-defined rules to transform a normal English sentence into a wh-question. Take this sentence as an example:

"I ate the bread" -> "What did I eat?"

Below are several sentences with a phrase in bold. Your task is to remove that phrase and ask about it using a wh-word. Step one is to rewrite the sentence as a question. Step two is to draw a parse tree for the question. Step three is to come up with the transformation rules to morph the sentence into the question (e.g., "remove the NP and put 'what' at the beginning of the sentence"). Step four is to find examples in the Twitter data that start with the same wh-words. List at least 5 examples each, and see if they match your transformation rules. If not, fix your rules.

  1. John picked up the chair.
  2. I am going to the big mall tomorrow.
  3. Susan decided to leave John.
  4. Susan decided to leave John.
  5. We thought about eating the burritos. (not a wh-word question. make it a yes/no question)

If you wish, PDF template here, Word template here.

Code Setup

Create a new lab5 directory. You can do this lab in Java, or you could also just use grep and some unix tools like wc. If you want to do java, just reuse the base code from Lab 4. Copy lab4/java/ to your lab5 directory (cp -R /courses/nchamber/nlp/lab4/java lab5/). The Datasets.java class will give you your tweets one at a time and you can search their strings! See Lab 4's description for more code details.

Helpful POS Tags

What to turn in

  1. Two grammars of VP rules. One for the verb leave and one for your verb of choice. Both should have probabilities attached to the rules, as well as the raw counts of each rule to see how you computed your probabilities (and for partial credit). Print it all out.
  2. A printout of your Wh-Question section. Use the template linked to above for easiest formatting.

How to turn in

No auto-submit. Print it out, staple, and hand in on the due date.

Grading

Verb Phrases (30 pts)

Wh-Questions (25 pts)

Total: 55 pts