22 |
How OCR Performance can Impact on the Automatic Extraction of Dictionary Content Structures
|
|
|
|
In: 19th annual Conference and Members’ Meeting of the Text Encoding Initiative Consortium (TEI) -What is text, really? TEI and beyond ; https://hal.archives-ouvertes.fr/hal-02263276 ; 19th annual Conference and Members’ Meeting of the Text Encoding Initiative Consortium (TEI) -What is text, really? TEI and beyond, Sep 2019, Graz, Austria (2019)
|
|
BASE
|
|
Show details
|
|
23 |
Asynchronous Pipeline for Processing Huge Corpora on Medium to Low Resource Infrastructures
|
|
|
|
In: 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7) ; https://hal.inria.fr/hal-02148693 ; 7th Workshop on the Challenges in the Management of Large Corpora (CMLC-7), Jul 2019, Cardiff, United Kingdom. ⟨10.14618/IDS-PUB-9021⟩ (2019)
|
|
BASE
|
|
Show details
|
|
24 |
Nénufar: Modelling a Diachronic Collection of Dictionary Editions as a Computational Lexical Resource
|
|
|
|
In: ELEX 2019: smart lexicography ; https://hal.inria.fr/hal-02272978 ; ELEX 2019: smart lexicography, Oct 2019, Sintra, Portugal (2019)
|
|
BASE
|
|
Show details
|
|
25 |
LMF Reloaded
|
|
|
|
In: AsiaLex 2019: Past, Present and Future ; https://hal.inria.fr/hal-02118319 ; AsiaLex 2019: Past, Present and Future, Jun 2019, Istanbul, Turkey (2019)
|
|
BASE
|
|
Show details
|
|
26 |
TEI Encoding of a Classical Mixtec Dictionary Using GROBID- Dictionaries
|
|
|
|
In: ELEX 2019: Smart Lexicography ; https://hal.inria.fr/hal-02264033 ; ELEX 2019: Smart Lexicography, Oct 2019, Sintra, Portugal ; https://elex.link/elex2019/ (2019)
|
|
Abstract:
International audience ; This paper presents the application of GROBID-Dictionaries (Khemakhem et al. 2017, Khemakhem et al. 2018a, Khemakhem et al. 2018b, Khemakhem et al. 2018c), an open source machine learning system for automatically structuring print dictionaries in digital format into TEI (Text Encoding Initiative) to a historical lexical resource of Colonial Mixtec 'Voces del Dzaha Dzahui' published by the Dominican fray Francisco Alvarado in the year 1593. The GROBID-Dictionaries application was applied to a reorganized and modernized version of the historical resource published by Jansen and Perez Jiménez (2009). The TEI dictionary produced will be integrated into a language documentation project dealing with Mixtepec-Mixtec (ISO 639-3: mix) (Bowers & Romary, 2017, 2018a, 2018b) an under-resourced indigenous language native to the Juxtlahuaca district of Oaxaca Mexico.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL]; GROBID-Dictionaries; Mixtec; TEI
|
|
URL: https://hal.inria.fr/hal-02264033/file/eLex_2019_abstract_111.pdf https://hal.inria.fr/hal-02264033 https://hal.inria.fr/hal-02264033/document
|
|
BASE
|
|
Hide details
|
|
27 |
CamemBERT: a Tasty French Language Model
|
|
|
|
In: https://hal.inria.fr/hal-02445946 ; 2019 (2019)
|
|
BASE
|
|
Show details
|
|
28 |
TEI and the Mixtepec-Mixtec corpus: data integration, annotation and normalization of heterogeneous data for an under-resourced language
|
|
|
|
In: 6th International Conference on Language Documentation and Conservation (ICLDC) ; https://hal.inria.fr/hal-02075475 ; 6th International Conference on Language Documentation and Conservation (ICLDC), Feb 2019, Honolulu, United States (2019)
|
|
BASE
|
|
Show details
|
|
29 |
Preparing the Dictionnaire Universel for Automatic Enrichment
|
|
|
|
In: 10th International Conference on Historical Lexicography and Lexicology (ICHLL) ; https://hal.inria.fr/hal-02131598 ; 10th International Conference on Historical Lexicography and Lexicology (ICHLL), Jun 2019, Leeuwarden, Netherlands ; https://easychair.org/smart-program/ICHLL-10/ (2019)
|
|
BASE
|
|
Show details
|
|
30 |
Connecting the Humanities through Research Infrastructures
|
|
|
|
In: 4th Digital Humanities in the Nordic Countries (DHN 2019) ; https://hal.inria.fr/hal-02047512 ; 4th Digital Humanities in the Nordic Countries (DHN 2019), Mar 2019, Copenhagen, Denmark ; https://cst.dk/DHN2019/DHN2019.html (2019)
|
|
BASE
|
|
Show details
|
|
31 |
The place of lexicography in (computer) science
|
|
|
|
In: The Future of Academic Lexicography: Linguistic Knowledge Codification in the Era of Big Data and AI ; https://hal.inria.fr/hal-02358218 ; The Future of Academic Lexicography: Linguistic Knowledge Codification in the Era of Big Data and AI, Frieda Steurs; Dirk Geeraerts; Niels Schiller; Marian Klamer; Iztok Kosem, Nov 2019, Leiden, Netherlands ; https://www.lorentzcenter.nl/lc/web/2019/1177/program.php3?wsid=1177&venue=Oort (2019)
|
|
BASE
|
|
Show details
|
|
32 |
Asynchronous pipelines for processing huge corpora on medium to low resource infrastructures ...
|
|
|
|
BASE
|
|
Show details
|
|
33 |
From disparate disciplines to unity in diversity. How the PARTHENOS project brings Humanities Research Infrastructures together ...
|
|
|
|
BASE
|
|
Show details
|
|
34 |
From disparate disciplines to unity in diversity. How the PARTHENOS project brings Humanities Research Infrastructures together ...
|
|
|
|
BASE
|
|
Show details
|
|
38 |
Automatic TEI encoding of manuscripts catalogues with GROBID-Dictionaries ...
|
|
|
|
BASE
|
|
Show details
|
|
39 |
TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources ...
|
|
|
|
BASE
|
|
Show details
|
|
40 |
Automatic TEI encoding of manuscripts catalogues with GROBID-Dictionaries ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|