DE eng

Search in the Catalogues and Directories

Hits 1 – 11 of 11

1
The ParlaMint corpora of parliamentary proceedings
BASE
Show details
2
The ParlaMint corpora of parliamentary proceedings
In: Lang Resour Eval (2022)
BASE
Show details
3
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
BASE
Show details
4
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.1
BASE
Show details
5
Linguistically annotated multilingual comparable corpora of parliamentary debates ParlaMint.ana 2.0
BASE
Show details
6
Multilingual comparable corpora of parliamentary debates ParlaMint 2.0
BASE
Show details
7
Multilingual comparable corpora of parliamentary debates ParlaMint 1.0
Abstract: ParlaMint is a multilingual set of comparable corpora containing parliamentary debates mostly starting at the end of 2015 and extending to mid-2020, with each corpus being about 20 million words in size. The sessions in the corpora are marked as belonging to the COVID-19 period (after October 2019), or being "reference" (before that date). The corpora have extensive meta-data about the speakers (name, gender, party affiliation, MP status), are structured into time-stamped terms, sessions and meetings, with each speech being marked by its speaker and their role (chair, regular speaker). The speeches also contain marked-up transcriber comments, such as gaps in the transcription, interruptions, applause, etc. The corpora are encoded according to the Parla-CLARIN TEI recommendation, but have been validated to the compatible but much stricter ParlaMint schemas. The schemas are included in the distribution, along with scripts to convert the corpora into other formats. The ZIP files with the TEI encoded corpora also include the automatically derived plain text version of the corpus, along with metadata on the speeches. In addition to the ParlaMint TEI encoded corpora, their linguistically encoded variants (".ana") are also available. The annotation includes named entities, lemmatisation, part-of-speech tagging, and morphological features and syntactic parses according to the Universal Dependencies recommendations. State-of-the-art tools have been used to perform the annotations. The .ana.zip corpora include the ParlaMint encoded XML, as well as derived formats, in particular, CoNLL-U and vertical files.
Keyword: Bulgarian Parliament; COVID-19; Croatian Parliament; Parla-CLARIN; parliamentary debates; Polish Parliament; Slovenian Parliament; TEI
URL: http://hdl.handle.net/11356/1345
BASE
Hide details
8
A computational account of multi-word numeral phrases in Polish
In: Investigations into formal slavic linguistics ; 1. - Frankfurt am Main [u.a.] : Lang (2003), 405-415
BLLDB
Show details
9
Towards a bi-modular automatic analyzer of large Polish corpora
In: Investigations into formal slavic linguistics ; 1. - Frankfurt am Main [u.a.] : Lang (2003), 363-372
BLLDB
Show details
10
AMOR: program automatycznej analizy fleksyjnej tekstu polskiego
In: Polskie Towarzystwo Językoznawcze. Biuletyn Polskiego Towarzystwa Je̜zykoznawczego. - Warszawa : Energeia 58 (2002), 175-186
BLLDB
Show details
11
Dehomonimizacja i desynkretyzacja w procesie automatycznego przetwarzania wielkich korpusów tekstów polskich
In: Polskie Towarzystwo Językoznawcze. Biuletyn Polskiego Towarzystwa Je̜zykoznawczego. - Warszawa : Energeia 58 (2002), 187-199
BLLDB
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
4
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
7
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern