A database of Narrative Schemas: structured sets of related events, semantic roles (the actors involved), and a temporal ordering of the events. See the publications below for details on the learning algorithm.
Since schemas are essentially clusters of verbs and their actors, schemas can be arbitrarily large. I am thus providing several different sizes of schemas for download. The clusters are grown until they reach the maximum size. While cutoff scores could be used to control cluster size, we are not convinced that this provides better schemas, and so are only cutting off at a specified size. However, we include each verb's score within its schemas so applications can make use of the scores for further tuning if they so choose.
Schemas of size 6: download
Schemas of size 8: download
Schemas of size 10: download
Schemas of size 12: download
Verb Temporal Orderings: download
The format is a text file, not xml. It is easily readable by humans, but also structured for machine input. The following is an example schema with line by line descriptions.
*****
Schemas are separated by a line with five asterisks.
score=24.893072
The first line gives the score for the overall schema. It is the sum of all pairwise verb slot scores amongst the verbs in the schema (the score is a mix of mutual information and argument count overlap, see published papers).
Events: enact pass amend approve repeal require enforce intend
Events is a space-separated line of all verbs in the schema.
Scores: 7.587 7.281 6.236 6.117 6.016 5.532 5.523 5.493
Scores is a space-separated line of the individual verb scores, aligned with the previous line's verbs. Verbs and their scores are sorted by the verb with the strongest connection to all other verbs in the schema.
[ approve-s intend-s require-o repeal-s enact-s enforce-s amend-s pass-s ] ( congress 10.435 legislature 8.179 house 7.966 state 7.753 company 7.645 bill 7.589 senate 7.476 government 7.466 board 7.456 commission 7.337 measure 7.328 president 7.305 committee 7.295 fda 7.290 agency 7.282 assembly 7.262 republican 7.235 student 7.217 council 7.202 official 7.194 department 7.194 group 7.194 china 7.147 parliament 7.147 voter 7.147 bell 7.129 driver 7.129 maker 7.129 mof 7.129 leaders 7.129 nasa 7.106 luggage 7.106 candidate 7.106 city 7.106 buyer 7.106 majority 7.106 nasdaq 7.106 traveler 7.106 ways 7.106 panel 7.106 clinton 7.106 israel 7.106 sakamaki 7.106 visitor 7.106 fcc 7.106 dialogue 7.106 lawmaker 7.106 leach 7.106 thursday 7.106 -rrb- 7.106 authority 7.106 law 7.106 morton 7.106 shareholder 7.106 eeoc 7.106 iraq 7.106 two 7.106 plan 7.106 )
Each line starting with square brackets is a Role definition. It defines a single schema role, and first lists the subject/object/prep positions that the role fills. In this example, almost all of the verbs in the schema have their subjects filled by this role (with the exception of 'require'). The verb-s syntax indicates subject, verb-o indicates object, and verb-p is a preposition. Following the square brackets is a parenthetical list of role types (head words of arguments) and their scores with this particular role. The scores are calculated based on counts of observances between pairs of verbs in this chain. All arguments and scores are space separated, there are no multi-word head words.
[ enforce-o repeal-o require-s amend-o pass-o approve-o intend-o enact-o ] ( law 14.458 bill 12.606 rule 11.711 laws 11.681 act 11.604 legislation 11.460 measure 11.161 agreement 11.153 plan 11.091 amendment 11.007 resolution 10.964 ordinance 10.826 budget 10.772 regulation 10.738 test 10.726 settlement 10.717 tax 10.702 proposal 10.694 contract 10.694 ix 10.671 transaction 10.638 initiative 10.629 senate 10.606 merger 10.606 treaty 10.606 deal 10.583 ban 10.583 constitution 10.563 version 10.550 company 10.536 zenapax 10.536 motion 10.518 thursday 10.518 split 10.518 congress 10.518 services 10.518 takeover 10.518 bid 10.518 faaa 10.495 nominee 10.495 limit 10.495 charter 10.495 policy 10.495 car 10.495 expansion 10.495 guideline 10.495 order 10.495 decree 10.495 university 10.495 provision 10.495 )
Same as the previous line, but this defines a different role. Almost all of the verb's objects are now included here.
The file contains pairwise counts of how often a pair of verbs was classified as before one another. The pair of verbs are listed in order. For instance, A B 23, indicates that A was classified as before B 23 times. B A 4 would indicate B before A (equally, A after B) 4 times. Each line is a tab-separated triple, the two verbs and the count.
See the publications below for details, but keep in mind that state-of-the-art temporal classification still has far to go. These counts come from supervised classifiers, so there are large chunks of unseen pairs with unreliable counts. When deciding if A is before B, you may want to compare the A B and B A counts to each other before making a decision.
This material is based upon work supported by the National Science Foundation under Grant No. IIS-0811974.
Unsupervised Learning of Narrative Schemas and their Participants
Nathanael Chambers and Dan Jurafsky
ACL-09, Singapore. 2009.
Unsupervised Learning of Narrative Event Chains
Nathanael Chambers and Dan Jurafsky
ACL-08, Ohio, USA. 2008.
Classifying Temporal Relations Between Events
Nathanael Chambers, Shan Wang, Dan Jurafsky
ACL-07, Prague. 2007.
testing link