2 |
A morph-based and a word-based treebank for Beja
|
|
|
|
In: SyntaxFest ; TLT 2021 - 20th International Workshop on Treebanks and Linguistic Theories ; https://hal.archives-ouvertes.fr/hal-03494462 ; TLT 2021 - 20th International Workshop on Treebanks and Linguistic Theories, Mar 2022, Sofia, Bulgaria (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Generación de flexión morfológica con UniMorph.: Evaluación con base de datos relacional y pautas de entrenamiento
|
|
|
|
In: Procesamiento del lenguaje natural, ISSN 1135-5948, Nº. 68, 2022, pags. 61-70 (2022)
|
|
BASE
|
|
Show details
|
|
9 |
A morph-based and a word-based treebank for Beja
|
|
|
|
In: SyntaxFest ; https://hal.archives-ouvertes.fr/hal-03494462 ; SyntaxFest, In press (2021)
|
|
BASE
|
|
Show details
|
|
10 |
Old Catalan Morphosyntax: developing an annotated corpus
|
|
|
|
In: EISSN: 2059-481X ; Journal of Open Humanities Data ; https://hal.archives-ouvertes.fr/hal-03617737 ; Journal of Open Humanities Data, Ubiquity Press, 2021, 7, pp.30. ⟨10.5334/johd.54⟩ (2021)
|
|
BASE
|
|
Show details
|
|
14 |
Coreference in Universal Dependencies 0.2 (CorefUD 0.2)
|
|
|
|
Abstract:
CorefUD is a collection of previously existing datasets annotated with coreference, which we converted into a common annotation scheme. In total, CorefUD in its current version 0.2 consists of 17 datasets for 11 languages. The datasets are enriched with automatic morphological and syntactic annotations that are fully compliant with the standards of the Universal Dependencies project. All the datasets are stored in the CoNLL-U format, with coreference- and bridging-specific information captured by attribute-value pairs located in the MISC column. The collection is divided into a public edition and a non-public (ÚFAL-internal) edition. The publicly available edition is distributed via LINDAT-CLARIAH-CZ and contains 13 datasets for 10 languages (1 dataset for Catalan, 2 for Czech, 2 for English, 1 for French, 2 for German, 1 for Hungarian, 1 for Lithuanian, 1 for Polish, 1 for Russian, and 1 for Spanish), excluding the test data. The non-public edition is available internally to ÚFAL members and contains additional 4 datasets for 2 languages (1 dataset for Dutch, and 3 for English), which we are not allowed to distribute due to their original license limitations. It also contains the test data portions for all datasets. When using any of the harmonized datasets, please get acquainted with its license (placed in the same directory as the data) and cite the original data resource too. Version 0.2 consists of exactly the same datasets as the version 0.1. All automatically parsed datasets were re-parsed for v0.2 using UDPipe 2 with models trained on UD 2.6. Catalan-AnCora, Spanish-AnCora and English-GUM have been updated to match the their UD 2.9 versions.
|
|
Keyword:
bridging relations; coreference; dependency; harmonized annotation; treebank
|
|
URL: http://hdl.handle.net/11234/1-4598
|
|
BASE
|
|
Hide details
|
|
|
|