CAEVO: CAscading EVent Ordering system

The CAEVO architecture is a complete text-to-order publicly available event ordering system. It produces dense temporal graphs over events and time expressions, and operates either on raw text or from text pre-annotated with events/timexes. Specifically, CAEVO provides the following functionality:

Event extraction (based on NavyTime, identifies event words in text)
Time Expression extraction (based on SUTime, identifies time expressions in text)
Temporal relation extraction (label pairs of events/times)
Transitive inference over temporal relations

CAEVO is a sieve-based architecture, so you can integrate your own code seamlessly. You just need to extend a Sieve java class, implement its required methods, and the CAEVO architecture will do the rest, including transitive closure and consistency checks.

Download CAEVO

The code is available on github. Maven compiles and manages its dependencies. The only external requirement not handled by maven is WordNet. See below (Run CAEVO) for details.

New to github? There is a Download ZIP button on the page, or you can clone the git repository (as is more common). If you don't know git, that's ok, just download the code as a ZIP file and you're ready to go. You won't easily get future updates this way, however.

Requirement: Download WordNet 3.1 dictionaries. Edit the jwnl_file_properties.xml file to point to the path of your downloaded dictionaries. Create an environment variable JWNL that points to this properties file.

Run Event Extractor Only

runevents.sh -model src/main/resources/models <text-file> raw

This will create a new text file text-file.withevents that is identical to the input text file, but with the event words marked up. It will also create a full XML file text-file.info.xml. You can try src/test/resources/testing-events.txt as an input text file to see how it works.

Run CAEVO (command-line)

runcaevoraw.sh <text-file|directory>
Runs CAEVO on a raw text file or directory of text files. It annotates events, times, and temporal relations.
Working example: runcaevoraw.sh src/test/resources/news.txt

runcaevoxml.sh <text-file|directory>
Runs CAEVO on a single file or directory of XML files where the XML has a TEXT element containing raw text. Output is a different XML file containing all of the files' annotations.
Working example: runcaevoraw.sh src/test/resources/news.xml

runcaevo.sh
Runs CAEVO on a pre-processed XML file from Tempeval-3 that is included in the distribution. See the shell script itself for different XML files available to you. One of the provided XML files is already hard-coded in the script.

runmarkup.sh <xml-info-file>
Inputs the specified XML file that is typically the output from the above run scripts. This will reproduce the original text input files, but marked up with events, timexes, and tlinks.

Run CAEVO (as an API)

CAEVO was written to be easily extensible. Below are the major classes for those interested in writing their own sieves to plugin to CAEVO.

caevo.Main
The main code that reads from the command line and controls the architecture.
caevo.SieveDocuments
The class that you must instantiate to hold your documents. The core methods (markupEvents, markupTimexes, runSieves) all operate on a given SieveDocuments parameter.
caevo.SieveDocument
The class that represents a single document. It stores a list of SieveSentence objects that represent the document's sentences. It also stores the events, timexes, tlinks, and document creation times for the document. Most of the methods that you will ever need for reasoning over events and time are found in this object.
caevo.SieveSentence
The class that stores the parse tree, typed dependencies, events, and timexes for a single sentence. These are typically populated by methods in caevo.Main.
caevo.TextEvent
Represents a single textual event and its attributes (e.g., tense). It also manages the XML representation of the event.
caevo.Timex
Represents a single timex and its attributes. It also manages the XML representation of the timex.
caevo.tlink.TLink
There are three TLink classes that inherit from TLink (EventEventLink, EventTimeLink, TimeTimeLink). The TLink class contains the methods for basic temporal reasoning, including inverting relations and determining if two relations conflict.

Sieve Order

CAEVO is a cascade of sieves. Which sieves run and in what order? There is a text file for the list of sieves that is read in at runtime. Look at default.sieves and you'll see the sieves. The runcaevo.sh script passes this in as a Java argument sieves.

There is also a text file of properties for each sieve that can be changed for different runs of CAEVO. The text file is default.properties and is passed in as a Java argument props.

Ablation tests are easy. Just comment out the sieves you wish to remove.

Create Your Own Sieve

You can plugin any working Java code into the CAEVO framework. All you need to do is implement the caevo.sieves.Sieve interface. There are only two methods you have to include:

public List annotate(SieveDocument doc, List currentTLinks);

public void train(SieveDocuments infoDocs);

The train method is for learning sieves that require a training stage. Rule-based sieves can just have an empty train method. The annotate method takes a single SieveDocument and a list of current TLinks. You must write this method to link any entity pairs in the given document that you wish to annotate. You can even ignore the given list of TLinks if you'd like. CAEVO will take care of repetitive annotations for you.

Take a look at the very simple sieve in caevo.sieves.BaselineEventDCT. It looks for any 'said' events, and creates an "is included" TLink with the document time stamp.

Once you've created your sieve, you just have to "turn it on". Just edit the default.sieves list in the main directory. You can add your sieve anywhere in the gauntlet that you'd like, and see if it runs.

Need parameters? Does your sieve require dynamic parameters for different runs? Look in default.properties. These are all parameters for different sieves. Simply add your own with the name of your Java class as its prefix. You can query for these in your class using the TimeSieveProperties class that loads these properties at runtime: For example:

TimeSieveProperties.getBoolean("EventEventVagueSieve.considerTense", true);

This looks up the considerTense property to see if the EventEventVague sieve should use tense rules. The "true" parameter is the default value in case the property is not set.

TimeBank-Dense: A Dense Event Ordering Corpus

A corpus of 12,000 temporal links between events and time expressions in 36 of the TimeBank's documents. The dataset is split into a training, dev, and test set. The dev/test docs can be found in the github evaluate code variables at the top.

There are a couple different versions available for download.

Full annotation linking TimeBank entities.
This is the complete set of relations in the TimeBank-Dense annotation. Annotators labeled edges between all TimeBank events and timex entities (in the same sentence, and neighboring sentences). Note that this is not the dataset used in the published work.
Full annotation linking TimeBank entities, but with fine-grained VAGUE relations.
This is the same as above, but instead of a single VAGUE relation, it breaks the relation into three subtypes: Mutual Vague (annotators agreed on VAGUE), Partial Vague (at least one annotator chose VAGUE, but the other(s) did not), and None Vague (annotators disagreed on the relation, and neither chose VAGUE).
Annotation filtered for TempEval-3 entities.
This is the same annotation as (1) above, but with certain relations removed. Specifically, any relation between an event or timex that was not in the TempEval-3 dataset was removed. This is the dataset used in the experiments of Chambers et al. (2014).
Annotation filtered for TempEval-3 entities, but with fine-grained VAGUE relations.
This is the same as (3) above, but VAGUE is split into the fine-grained VAGUE relations.

Dense Annotation Tool

The TimeBank-Dense annotation required a tool that forces annotators to label all event/timex pairs. We custom built a command-line tool that reads TimeBank formatted XML documents, and prompts the annotator to choose a label for each relevant pair. The tool also contains its own transitive inference procedure, skipping those pairs that are inferrable from the annotator's previous choices. The tool greatly speeds up annotation and helps maintain consistency.

The tool caevo.annotate.Annotator is available on github with CAEVO itself.

Acknowledgements

This is a collaborative work between the following researchers: Nate Chambers, Bill McDowell, Taylor Cassidy, and Steve Bethard. The Center of Excellence at Johns Hopkins sponsored the work during a summer SCALE meeting. This would not have come about without their vision to bring together students and researchers from a diverse set of locations.

References

Nathanael Chambers, Bill McDowell, Taylor Cassidy, and Steve Bethard
Dense Event Ordering with a Multi-Pass Architecture.
Transactions of the ACL, to appear, 2014.

Taylor Cassidy, Bill McDowell, Nathanael Chambers, and Steven Bethard
An Annotation Framework for Dense Event Ordering.
ACL-2014, Baltimore, MD. 2014.