DE eng

Search in the Catalogues and Directories

Hits 1 – 4 of 4

1
Cleaning the Europarl Corpus for Linguistic Applications
Graën, Johannes [Verfasser]; Batinic, Dolores [Verfasser]; Volk, Martin [Verfasser]. - Hildesheim : Universitätsbibliothek Hildesheim, 2014
DNB Subject Category Language
Show details
2
Cleaning the Europarl Corpus for Linguistic Applications ...
Graën, Johannes; Batinić, Dolores; Volk, Martin. - : Stiftung Universität Hildesheim, 2014
BASE
Show details
3
Cleaning the Europarl Corpus for Linguistic Applications
Abstract: We discovered several recurring errors in the current version of the Europarl Corpus originating both from the web site of the European Parliament and the corpus compilation based thereon. The most frequent error was incompletely extracted metadata leaving non-textual fragments within the textual parts of the corpus files. This is, on average, the case for every second speaker change. We not only cleaned the Europarl Corpus by correcting several kinds of errors, but also aligned the speakers’ contributions of all available languages and compiled every- thing into a new XML-structured corpus. This facilitates a more sophisticated selection of data, e.g. querying the corpus for speeches by speakers of a particular political group or in particular language combinations.
Keyword: Computerlinguistik; ddc:400; Korpus
URL: https://hildok.bsz-bw.de/frontdoor/index/index/docId/265
https://hildok.bsz-bw.de/files/265/p040.pdf
https://nbn-resolving.org/urn:nbn:de:gbv:hil2-opus-2857
BASE
Hide details
4
Cleaning the Europarl Corpus for Linguistic Applications
In: Graën, Johannes; Batinić, Dolores; Volk, Martin (2014). Cleaning the Europarl Corpus for Linguistic Applications. In: Konvens 2014, Hildesheim, 8 October 2014 - 10 October 2014. (2014)
BASE
Show details

Catalogues
0
0
0
0
1
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
3
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern