This is a database of English verbs and a learned distribution for each verb/event's typical duration in the real world. For example, “war” typically lasts years or decades, while “look” lasts seconds or minutes. We learned the duration of events by querying the Web for lexical patterns (e.g. he looked for 3 minutes) and observing how often each word occurs with a specific time duration. Details on the learning approach can be found here:
Using Query Patterns to Learn the Duration of Events
Andrey Gusev, Nathanael Chambers, Pranav Khaitan, Divye Khilnani, Steven Bethard, Dan Jurafsky
IWCS-2010, England. 2010.
The lexicon format is a text file, intended to be both human readable as well easily parsable. Each line in lexicon represents a single event with its duration distribution. The verb-object event combinations are sorted by frequency of occurrence in NYT portion of Gigaword with the 10 most frequent grammatical objects for each verb.
EVENT=to be,ID=e1-1,OBJ=it,PATTERNS=2,DISTR=[0.290;0.105;0.182;0.100;0.059;0.098;0.106;0.060;]
EVENT - base event verb form
ID - unique id of he event, the first number is the verb number and the second is the object number - for example e1-2 means it's the most frequent verb with the second most frequent object
OBJ - the object of the event
PATTERNS - number of patterns above the threshold with which distribution in DISTR was constructed
DISTR - the distribution oh hits derived based on query patterns: [seconds;minutes;hours;days;weeks;months;years;decades]
Duration Lexicon Database: download
The following list of document names are the train/test splits used to train and evaluate in the above paper.
Files |
|
Training |
"APW19980213.1310.tmldur.xml", "NYT19980424.0421.tmldur.xml", "APW19980227.0468.tmldur.xml", "CNN19980223.1130.0960.tmldur.xml", "APW19980308.0201.tmldur.xml", "CNN19980222.1130.0084.tmldur.xml", "CNN19980126.1600.1104.tmldur.xml", "ABC19980120.1830.0957.tmldur.xml", "VOA19980305.1800.2603.tmldur.xml", "APW19980301.0720.tmldur.xml", "APW19980526.1320.tmldur.xml", "PRI19980303.2000.2550.tmldur.xml", "APW19980227.0476.tmldur.xml", "APW19980306.1001.tmldur.xml", "APW19980213.1320.tmldur.xml", "ea980120.1830.0456.tmldur.xml", "PRI19980115.2000.0186.tmldur.xml", "PRI19980121.2000.2591.tmldur.xml", "PRI19980205.2000.1998.tmldur.xml", "ed980111.1130.0089.tmldur.xml", "APW19980219.0476.tmldur.xml", "APW19980213.1380.tmldur.xml", "PRI19980306.2000.1675.tmldur.xml", "APW19980418.0210.tmldur.xml", "NYT19980206.0466.tmldur.xml", "VOA19980331.1700.1533.tmldur.xml", "VOA19980501.1800.0355.tmldur.xml", "APW19980626.0364.tmldur.xml", "CNN19980227.2130.0067.tmldur.xml", "ea980120.1830.0071.tmldur.xml", "AP900816-0139.tmldur.xml", "VOA19980303.1600.2745.tmldur.xml", "APW19980322.0749.tmldur.xml", "CNN19980213.2130.0155.tmldur.xml", "ABC19980108.1830.0711.tmldur.xml", "PRI19980205.2000.1890.tmldur.xml", "APW19980227.0489.tmldur.xml", "PRI19980213.2000.0313.tmldur.xml" |
Test |
"ABC19980114.1830.0611.tmldur.xml", "ABC19980304.1830.1636.tmldur.xml", "APW19980227.0494.tmldur.xml", "APW19980501.0480.tmldur.xml", "NYT19980206.0460.tmldur.xml", "NYT19980212.0019.tmldur.xml", "NYT19980402.0453.tmldur.xml", "PRI19980216.2000.0170.tmldur.xml", "SJMN91-06338157.tmldur.xml", "VOA19980303.1600.0917.tmldur.xml" |
WSJ Test |
"wsj_0006.tmldur.xml", "wsj_0026.tmldur.xml", "wsj_1025.tmldur.xml", "wsj_1031.tmldur.xml", "wsj_1035.tmldur.xml", "wsj_1038.tmldur.xml", "wsj_1039.tmldur.xml", "wsj_1040.tmldur.xml", "wsj_1042.tmldur.xml", "wsj_1073.tmldur.xml" |