CAEVO: An event ordering system click here | Timebank-Dense: A dense corpus click here | A Dense Annotation Tool click here |
The CAEVO architecture is a complete text-to-order publicly available event ordering system. It produces dense temporal graphs over events and time expressions, and operates either on raw text or from text pre-annotated with events/timexes. Specifically, CAEVO provides the following functionality:
CAEVO is a sieve-based architecture, so you can integrate your own code seamlessly. You just need to extend a Sieve java class, implement its required methods, and the CAEVO architecture will do the rest, including transitive closure and consistency checks.
The code is available on github. Maven compiles and manages its dependencies. The only external requirement not handled by maven is WordNet. See below (Run CAEVO) for details.
New to github? There is a Download ZIP button on the page, or you can clone the git repository (as is more common). If you don't know git, that's ok, just download the code as a ZIP file and you're ready to go. You won't easily get future updates this way, however.
Requirement: Download WordNet 3.1 dictionaries. Edit the jwnl_file_properties.xml file to point to the path of your downloaded dictionaries. Create an environment variable JWNL that points to this properties file.
runevents.sh -model src/main/resources/models <text-file> raw
This will create a new text file text-file.withevents that is identical to the input text file, but with the event words marked up. It will also create a full XML file text-file.info.xml. You can try src/test/resources/testing-events.txt as an input text file to see how it works.
runcaevoraw.sh <text-file|directory>
Runs CAEVO on a raw text file or directory of text files. It annotates events, times, and temporal relations.
Working example: runcaevoraw.sh src/test/resources/news.txt
runcaevoxml.sh <text-file|directory>
Runs CAEVO on a single file or directory of XML files where the XML has a TEXT element containing raw text. Output is a different XML file containing all of the files' annotations.
Working example: runcaevoraw.sh src/test/resources/news.xml
runcaevo.sh
Runs CAEVO on a pre-processed XML file from Tempeval-3 that is included in the distribution. See the shell script itself for different XML files available to you. One of the provided XML files is already hard-coded in the script.
runmarkup.sh <xml-info-file>
Inputs the specified XML file that is typically the output from the above run scripts. This will reproduce the original text input files, but marked up with events, timexes, and tlinks.
CAEVO was written to be easily extensible. Below are the major classes for those interested in writing their own sieves to plugin to CAEVO.
CAEVO is a cascade of sieves. Which sieves run and in what order? There is a text file for the list of sieves that is read in at runtime. Look at default.sieves and you'll see the sieves. The runcaevo.sh script passes this in as a Java argument sieves.
There is also a text file of properties for each sieve that can be changed for different runs of CAEVO. The text file is default.properties and is passed in as a Java argument props.
Ablation tests are easy. Just comment out the sieves you wish to remove.
You can plugin any working Java code into the CAEVO framework. All you need to do is implement the caevo.sieves.Sieve interface. There are only two methods you have to include:
public Listannotate(SieveDocument doc, List currentTLinks);
public void train(SieveDocuments infoDocs);
The train method is for learning sieves that require a training stage. Rule-based sieves can just have an empty train method. The annotate method takes a single SieveDocument and a list of current TLinks. You must write this method to link any entity pairs in the given document that you wish to annotate. You can even ignore the given list of TLinks if you'd like. CAEVO will take care of repetitive annotations for you.
Take a look at the very simple sieve in caevo.sieves.BaselineEventDCT. It looks for any 'said' events, and creates an "is included" TLink with the document time stamp.
Once you've created your sieve, you just have to "turn it on". Just edit the default.sieves list in the main directory. You can add your sieve anywhere in the gauntlet that you'd like, and see if it runs.
Need parameters? Does your sieve require dynamic parameters for different runs? Look in default.properties. These are all parameters for different sieves. Simply add your own with the name of your Java class as its prefix. You can query for these in your class using the TimeSieveProperties class that loads these properties at runtime: For example:
TimeSieveProperties.getBoolean("EventEventVagueSieve.considerTense", true);
This looks up the considerTense property to see if the EventEventVague sieve should use tense rules. The "true" parameter is the default value in case the property is not set.
A corpus of 12,000 temporal links between events and time expressions in 36 of the TimeBank's documents. The dataset is split into a training, dev, and test set. The dev/test docs can be found in the github evaluate code variables at the top.
There are a couple different versions available for download.
The TimeBank-Dense annotation required a tool that forces annotators to label all event/timex pairs. We custom built a command-line tool that reads TimeBank formatted XML documents, and prompts the annotator to choose a label for each relevant pair. The tool also contains its own transitive inference procedure, skipping those pairs that are inferrable from the annotator's previous choices. The tool greatly speeds up annotation and helps maintain consistency.
The tool caevo.annotate.Annotator is available on github with CAEVO itself.
This is a collaborative work between the following researchers: Nate Chambers, Bill McDowell, Taylor Cassidy, and Steve Bethard. The Center of Excellence at Johns Hopkins sponsored the work during a summer SCALE meeting. This would not have come about without their vision to bring together students and researchers from a diverse set of locations.
Nathanael Chambers, Bill McDowell, Taylor Cassidy, and Steve Bethard
Dense Event Ordering with a Multi-Pass Architecture.
Transactions of the ACL, to appear, 2014.
Taylor Cassidy, Bill McDowell, Nathanael Chambers, and Steven Bethard
An Annotation Framework for Dense Event Ordering.
ACL-2014, Baltimore, MD. 2014.