2 |
Quality and Efficiency of Manual Annotation: Data from the Pre-annotation Bias Experiment (part of the PDT-C 2.0 project)
|
|
|
|
BASE
|
|
Show details
|
|
5 |
PDT-Vallex: Czech Valency lexicon linked to treebanks 4.0 (PDT-Vallex 4.0)
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Search for the Relation of Form and Function Using the ForFun Database
|
|
|
|
In: Prague Bulletin of Mathematical Linguistics , Vol 110, Iss 1, Pp 71-84 (2018) (2018)
|
|
BASE
|
|
Show details
|
|
14 |
Difference between Written and Spoken Czech: The Case of Verbal Nouns Denoting an Action
|
|
|
|
In: Prague Bulletin of Mathematical Linguistics , Vol 107, Iss 1, Pp 19-38 (2017) (2017)
|
|
BASE
|
|
Show details
|
|
20 |
Prague Czech-English Dependency Treebank 2.0
|
|
Hajič, Jan; Hajičová, Eva; Panevová, Jarmila; Sgall, Petr; Cinková, Silvie; Fučíková, Eva; Mikulová, Marie; Pajas, Petr; Popelka, Jan; Semecký, Jiří; Šindlerová, Jana; Štěpánek, Jan; Toman, Josef; Urešová, Zdeňka; Žabokrtský, Zdeněk. - : Linguistic Data Consortium, 2012. : https://www.ldc.upenn.edu, 2012
|
|
Abstract:
*Introduction* Prague Czech-English Dependency Treebank (PCEDT) 2.0 was developed by the Institute of Formal and Applied Linguistics at Charles University in Prague, Czech Republic. It is a corpus of Czech-English parallel resources translated, aligned and manually annotated for dependency structure, semantic labeling, argument structure, ellipsis and anaphora resolution. This release updates Prague Czech-English Dependency Treebank 1.0 (LDC2004T25) by adding English newswire texts so that it now contains over two million words in close to 100,000 sentences. *Data* The principal new material in PCEDT 2.0 is the inclusion of the entire Wall Street Journal data from Treebank-3 (LDC99T42). Not included from PCEDT 1.0 are the Readers Digest material, the Czech monolingual corpus, and the English-Czech dictionary. Each section is enhanced with a comprehensive manual linguistic annotation in the Prague Dependency Treebank style (LDC2006T01, Prague Dependency Treebank 2.0). The main features of this annotation style are: * dependency structure of the content words and coordinating and similar structures (function words are attached as their attribute values) * semantic labeling of content words and types of coordinating structures * argument structure, including an argument structure (valency) lexicon for both languages * ellipsis and anaphora resolution This annotation style is called tectogrammatical annotation, and it constitutes the tectogrammatical layer in the corpus. Please consult the PCEDT website for more information and documentation. *Samples* Please follow this link for a sample of the data included. *Updates* None at this time.
|
|
URL: https://catalog.ldc.upenn.edu/LDC2012T08
|
|
BASE
|
|
Hide details
|
|
|
|