DE eng

Search in the Catalogues and Directories

Hits 1 – 1 of 1

1
Multext-East: Parallel and Comparable Corpora and Lexicons for Six Central and Eastern European Languages
In: http://www.cs.vassar.edu/~ide/papers/MTE-final.pdf (1998)
Abstract: The EU Copernicus project Multext-East has created a multi-lingual corpus of text and speech data, covering the six languages of the project: Bulgarian, Czech, Estonian, Hungarian, Romanian, and Slovene. In addition, wordform lexicons for each of the languages were developed. The corpus includes a parallel component consisting of Orwell's Nineteen Eighty-Four, with versions in all six languages tagged for part-of-speech and aligned to English (also tagged for POS). We describe the encoding format and data architecture designed especially for this corpus, which is generally usable for encoding linguistic corpora. We also describe the methodology for the development of a harmonized set of morphosyntactic descriptions (MSDs), which builds upon the scheme for western European languages developed within the EAGLES project. We discuss the special concerns for handling the six project languages, which cover three distinct language families.
Keyword: Bulgarian; Czech; Estonian; Hungarian
URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.61.8670
http://www.cs.vassar.edu/~ide/papers/MTE-final.pdf
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
1
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern