1 |
Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881-1899)
|
|
|
|
In: ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora ; https://hal.archives-ouvertes.fr/hal-03623351 ; ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora, Jun 2022, Marseille, France ; https://www.clarin.eu/ParlaCLARIN-III (2022)
|
|
BASE
|
|
Show details
|
|
6 |
Terminological Methods in Lexicography: Conceptualising, Organising, and Encoding Terms in General Language Dictionaries
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Giving Depth to TEI-Based Descriptions of Manuscripts: The Golden Gospel of Ham
|
|
|
|
In: Aethiopica; Bd. 24 (2021); 175–211 ; Aethiopica; Vol. 24 (2021); 175–211 ; 2194-4024 ; 1430-1938 ; 10.15460/aethiopica.24.0 (2022)
|
|
BASE
|
|
Show details
|
|
8 |
Towards an Online Database of Ancient Dramatic Meters
|
|
|
|
In: FuturoClassico FCl; N. 7 (2021); 143-164 ; 2465-0951 (2022)
|
|
BASE
|
|
Show details
|
|
9 |
Understanding and reading XML ; Comprendre et lire le XML
|
|
|
|
In: https://halshs.archives-ouvertes.fr/halshs-03637142 ; École thématique. Comprendre et lire le XML, Bibliothèque du lab. CRISCO EA 4255, France. 2021, pp.72 ; Comprendre et lire le XML (2021)
|
|
BASE
|
|
Show details
|
|
10 |
XML and namespaces ; XML et espaces de nom
|
|
|
|
In: https://halshs.archives-ouvertes.fr/halshs-03637189 ; Doctorat. XML et espaces de nom, Bibliothèque du lab. CRISCO EA 4255, France. 2021, pp.44 ; XML et espaces de nom (2021)
|
|
BASE
|
|
Show details
|
|
11 |
Language Processing in Digital Editions of Russian 18 th Century Texts ; Лингвистическая обработка цифровых изданий русских текстов XVIII века
|
|
|
|
In: Corpora 2021 International Conference ; https://halshs.archives-ouvertes.fr/halshs-03285725 ; Corpora 2021 International Conference, Saint-Petersburg State University, Jul 2021, Saint-Petersbourg, Russia ; https://events.spbu.ru/events/corpora-2021 (2021)
|
|
BASE
|
|
Show details
|
|
12 |
La Base de français médiéval et le consortium CAHIER : dix ans d'échanges et de collaborations
|
|
|
|
In: 10 ans avec CAHIER. Des corpus d'auteurs pour les humanités à leur exploitation numérique ; https://halshs.archives-ouvertes.fr/halshs-03363517 ; 10 ans avec CAHIER. Des corpus d'auteurs pour les humanités à leur exploitation numérique, Jun 2021, Bordeaux, France ; https://cahier10.sciencesconf.org/344494 (2021)
|
|
BASE
|
|
Show details
|
|
13 |
Expanding the content model of annotationBlock
|
|
|
|
In: Next Gen TEI, 2021 - TEI Conference and Members’ Meeting ; https://hal.archives-ouvertes.fr/hal-03380805 ; Next Gen TEI, 2021 - TEI Conference and Members’ Meeting, Oct 2021, Virtual, United States (2021)
|
|
BASE
|
|
Show details
|
|
18 |
Training corpus ssj500k 2.3
|
|
Krek, Simon; Dobrovoljc, Kaja; Erjavec, Tomaž; Može, Sara; Ledinek, Nina; Holz, Nanika; Zupan, Katja; Gantar, Polona; Kuzman, Taja; Čibej, Jaka; Arhar Holdt, Špela; Kavčič, Teja; Škrjanec, Iza; Marko, Dafne; Jezeršek, Lucija; Zajc, Anja. - : Centre for Language Resources and Technologies, University of Ljubljana, 2021
|
|
Abstract:
The ssj500k training corpus contains about 500,000 tokens manually annotated on the levels of tokenisation, sentence segmentation, morphosyntactic tagging, and lemmatisation. About half of the corpus is also manually annotated with syntactic dependencies, named entities, and verbal multiword expressions. About a quarter of the corpus is also annotated with semantic role labels. The morphosyntactic tags and syntactic dependencies are included both in the JOS/MULTEXT-East framework, as well as in the framework of Universal Dependencies. The annotations of the ssj500k corpus follow (1) the MULTEXT-East V6 morphosyntactic specifications for Slovene, http://nl.ijs.si/ME/V6/msd/, (2) the JOS dependency schema, http://nl.ijs.si/jos/bib/jos-skladnja-navodila.pdf, the Universal Dependencies morphosyntactic specifications and syntactic dependencies for Slovene-SSJ, https://universaldependencies.org/, (4) the Janes annotation guidelines for Slovenian named entities, http://nl.ijs.si/janes/wp-content/uploads/2017/09/SlovenianNER-eng-v1.1.pdf, and (5) the Guidelines of the PARSEME shared task on verbal multiword expressions, http://parsemefr.lif.univ-mrs.fr/parseme-st-guidelines/1.1/ The vocabulary of (1) and (2) is provided in the back element and (3), (4), and (5) in the teiHeader of the TEI encoded corpus. The semantic role labels are also documented in the teiHeader. In contrast to the previous version 2.2, this version includes the corrected Universal Dependencies relations from UD version 2.8, updates the TEI encoding and adds UD annotations to the vertical file.
|
|
Keyword:
CONLL-U; dependency treebank; manual annotation; named entities; parsing; part-of-speech tagging; semantic role labelling; TEI; tokenisation; verbal multiword expressions
|
|
URL: http://hdl.handle.net/11356/1434
|
|
BASE
|
|
Hide details
|
|
20 |
Multilingual comparable corpora of parliamentary debates ParlaMint 2.1
|
|
|
|
BASE
|
|
Show details
|
|
|
|