1 |
Collecting and annotating corpora for three under-resourced languages of France: Methodological issues
|
|
|
|
In: ISSN: 1934-5275 ; EISSN: 1934-5275 ; Language Documentation & Conservation ; https://hal.archives-ouvertes.fr/hal-03273196 ; Language Documentation & Conservation, University of Hawaiʻi Press 2021, 15, pp.316-357 ; http://hdl.handle.net/10125/74645 (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Free Software Tools for Computational Linguistics: An Overview ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Free Software Tools for Computational Linguistics: An Overview ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Political analytics on election candidates and their parties in context of the US Presidential elections 2020
|
|
|
|
BASE
|
|
Show details
|
|
6 |
A Systematic Literature Review of Lexical Analyzer Implementation Techniques in Compiler Design ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
A Systematic Literature Review of Lexical Analyzer Implementation Techniques in Compiler Design ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Collecting and annotating corpora for three under-resourced languages of France: Methodological issues
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Collecting and annotating corpora for three under-resourced languages of France: Methodological issues
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Segmentation automatique en périodes pour le français parlé
|
|
|
|
In: Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles ; 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-02784773 ; 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles, 2020, Nancy, France. pp.241-248 (2020)
|
|
BASE
|
|
Show details
|
|
12 |
Free Software Tools for Computational Linguistics: An Overview ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Free Software Tools for Computational Linguistics: An Overview ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
SQL Translation Based on Query Process for Training School ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
SQL Translation Based on Query Process for Training School ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
How can one kill someone twice in Indonesian? Causal pluralism at the syntax-semantics interface
|
|
|
|
In: Proceedings of the Linguistic Society of America; Vol 5, No 1 (2020): Proceedings of the Linguistic Society of America; 29–43 ; 2473-8689 (2020)
|
|
BASE
|
|
Show details
|
|
18 |
CoNLL-Merge: Efficient Harmonization of Concurrent Tokenization and Textual Variation
|
|
Chiarcos, Christian; Schenk, Niko. - : Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019. : OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019), 2019
|
|
Abstract:
The proper detection of tokens in of running text represents the initial processing step in modular NLP pipelines. But strategies for defining these minimal units can differ, and conflicting analyses of the same text seriously limit the integration of subsequent linguistic annotations into a shared representation. As a solution, we introduce CoNLL Merge, a practical tool for harmonizing TSV-related data models, as they occur, e.g., in multi-layer corpora with non-sequential, concurrent tokenizations, but also in ensemble combinations in Natural Language Processing. CoNLL Merge works unsupervised, requires no manual intervention or external data sources, and comes with a flexible API for fully automated merging routines, validity and sanity checks. Users can chose from several merging strategies, and either preserve a reference tokenization (with possible losses of annotation granularity), create a common tokenization layer consisting of minimal shared subtokens (loss-less in terms of annotation granularity, destructive against a reference tokenization), or present tokenization clashes (loss-less and non-destructive, but introducing empty tokens as place-holders for unaligned elements). We demonstrate the applicability of the tool on two use cases from natural language processing and computational philology.
|
|
Keyword:
data heterogeneity; Data processing Computer science; linguistic annotation; merging; tab-separated values (TSV) format; tokenization
|
|
URN:
urn:nbn:de:0030-drops-103717
|
|
URL: https://drops.dagstuhl.de/opus/volltexte/2019/10371/ https://doi.org/10.4230/OASIcs.LDK.2019.7
|
|
BASE
|
|
Hide details
|
|
19 |
Towards Usable and FAIR Software for Arabic Textual Scholarship - Lessons from Kalīla and Dimna Research ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Towards Usable and FAIR Software for Arabic Textual Scholarship - Lessons from Kalīla and Dimna Research ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|