1 |
Fast query for large treebanks
|
|
|
|
Abstract:
This is a pre-print of a paper from Human Language Technologies: Proceedings of the 11th Annual Conference of the North American Chapter of the Association for Computational Linguistics 2010 published by Association for Computational Linguistics. http://naaclhlt2010.isi.edu/ ; A variety of query systems have been developed for interrogatingparsed corpora, or treebanks. With the arrival of efficient,wide-coverage parsers, it is feasible to create very largedatabases of trees.However, existing approaches that use in-memory search,or relational or XML database technologies, do not scale up.We describe a method for storage, indexing, and query oftreebanks that uses an information retrieval engine.Several experiments with a large treebank demonstrateexcellent scaling characteristics for a wide rangeof query types. This work facilitates the curation ofmuch larger treebanks, and enables them to be used effectivelyin a variety of scientific and engineering tasks.
|
|
URL: http://hdl.handle.net/11343/27681
|
|
BASE
|
|
Hide details
|
|
|
|