1 |
Automatic Normalisation of Early Modern French
|
|
|
|
In: https://hal.inria.fr/hal-03540226 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
From FreEM to D'AlemBERT ; From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French
|
|
|
|
In: Proceedings of the 13th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-03596653 ; Proceedings of the 13th Language Resources and Evaluation Conference, European Language Resources Association, Jun 2022, Marseille, France (2022)
|
|
Abstract:
8 pages, 2 figures, 4 tables ; International audience ; Language models for historical states of language are becoming increasingly important to allow the optimal digitisation and analysis of old textual sources. Because these historical states are at the same time more complex to process and more scarce in the corpora available, specific efforts are necessary to train natural language processing (NLP) tools adapted to the data. In this paper, we present our efforts to develop NLP tools for Early Modern French (historical French from the 16th to the 18th centuries). We present the FreEMmax corpus of Early Modern French and D'AlemBERT, a RoBERTa-based language model trained on FreEMmax. We evaluate the usefulness of D'AlemBERT by fine-tuning it on a part-of-speech tagging task, outperforming previous work on the test set. Importantly, we find evidence for the transfer learning capacity of the language model, since its performance on lesser-resourced time periods appears to have been boosted by the more resourced ones. We release D'AlemBERT and the open-sourced subpart of the FreEMmax corpus.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Corpus creation; Création de corpus; Digital humanities; Early Modern French; Français classique; Humanités Numériques; Language modelling; Langues peu dotées; Less-resourced languages; Modèle de langue neuronal; Modélisation linguistique; Neural language representation models; Partie du discours; POS tagging
|
|
URL: https://hal.inria.fr/hal-03596653
|
|
BASE
|
|
Hide details
|
|
3 |
Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
|
|
|
|
In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 ; https://hal.inria.fr/hal-03243380 ; Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Aug 2021, Bangkok, Thailand (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Few-shot learning through contextual data augmentation
|
|
|
|
In: EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03121971 ; EACL 2021 - 16th Conference of the European Chapter of the Association for Computational Linguistics, Apr 2021, Kiev / Virtual, Ukraine (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Variation graphique dans les documents d'Ancien Régime : Nouvelles approches scriptométriques
|
|
|
|
In: Journée d’étude : « Pour une histoire de la langue ‘par en bas’: textes privés et variation des langues dans le passé » ; https://hal.inria.fr/hal-03357080 ; Journée d’étude : « Pour une histoire de la langue ‘par en bas’: textes privés et variation des langues dans le passé », Sep 2021, Paris, France (2021)
|
|
BASE
|
|
Show details
|
|
6 |
[Book Review] Understanding Dialogue: Language Use and Social Interaction
|
|
|
|
In: ISSN: 0891-2017 ; EISSN: 1530-9312 ; Computational Linguistics ; https://hal.inria.fr/hal-03324500 ; Computational Linguistics, Massachusetts Institute of Technology Press (MIT Press), In press (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Expanding the content model of annotationBlock
|
|
|
|
In: Next Gen TEI, 2021 - TEI Conference and Members’ Meeting ; https://hal.archives-ouvertes.fr/hal-03380805 ; Next Gen TEI, 2021 - TEI Conference and Members’ Meeting, Oct 2021, Virtual, United States (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task? ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Document Sub-structure in Neural Machine Translation
|
|
|
|
In: Proceedings of the 12th Language Resources and Evaluation Conference ; 12th Language Resources and Evaluation Conference ; https://hal.archives-ouvertes.fr/hal-02900568 ; 12th Language Resources and Evaluation Conference, 2020, Marseille, France. pp.3657-3667 (2020)
|
|
BASE
|
|
Show details
|
|
11 |
DiaBLa: A Corpus of Bilingual Spontaneous Written Dialogues for Machine Translation
|
|
|
|
In: ISSN: 1574-020X ; EISSN: 1574-0218 ; Language Resources and Evaluation ; https://hal.inria.fr/hal-03021633 ; Language Resources and Evaluation, Springer Verlag, 2020, ⟨10.1007/s10579-020-09514-4⟩ (2020)
|
|
BASE
|
|
Show details
|
|
12 |
Architecture of a Scalable, Secure and Resilient Translation Platform for Multilingual News Media
|
|
|
|
In: Proceedings of the 1st International Workshop on Language Technology Platforms ; 1st International Workshop on Language Technology Platforms ; https://hal.archives-ouvertes.fr/hal-02900633 ; 1st International Workshop on Language Technology Platforms, 2020, Marseille, France. pp.16-21 (2020)
|
|
BASE
|
|
Show details
|
|
13 |
ParBLEU: Augmenting Metrics with Automatic Paraphrases for the WMT'20 Metrics Shared Task
|
|
|
|
In: Proceedings of the 5th Conference on Machine Translation ; 5th Conference on Machine Translation ; https://hal.archives-ouvertes.fr/hal-02981143 ; 5th Conference on Machine Translation, Nov 2020, Online, Unknown Region (2020)
|
|
BASE
|
|
Show details
|
|
14 |
The University of Edinburgh’s English-Tamil and English-Inuktitut Submissions to the WMT20 News Translation Task
|
|
|
|
In: Proceedings of the 5th Conference on Machine Translation ; 5th Conference on Machine Translation ; https://hal.archives-ouvertes.fr/hal-02981153 ; 5th Conference on Machine Translation, Nov 2020, Online, Unknown Region (2020)
|
|
BASE
|
|
Show details
|
|
15 |
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages
|
|
|
|
In: Proceedings of the 5th Conference on Machine Translation ; 5th Conference on Machine Translation ; https://hal.inria.fr/hal-02986356 ; 5th Conference on Machine Translation, 2020, Online, Unknown Region (2020)
|
|
BASE
|
|
Show details
|
|
16 |
The University of Edinburgh-Uppsala University’s Submission to the WMT 2020 Chat Translation Task
|
|
|
|
In: Proceedings of the 5th Conference on Machine Translation ; 5th Conference on Machine Translation ; https://hal.archives-ouvertes.fr/hal-02981159 ; 5th Conference on Machine Translation, Nov 2020, Online, Unknown Region (2020)
|
|
BASE
|
|
Show details
|
|
17 |
Architecture of a Scalable, Secure and Resilient Translation Platform for Multilingual News Media ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Architecture of a Scalable, Secure and Resilient Translation Platform for Multilingual News Media ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Findings of the WMT 2020 Biomedical Translation Shared Task: Basque, Italian and Russian as New Additional Languages
|
|
|
|
In: Fraunhofer FOKUS ; Fraunhofer IBMT (2020)
|
|
BASE
|
|
Show details
|
|
20 |
The University of Edinburgh’s Submissions to the WMT19 News Translation Task
|
|
|
|
In: 4th Conference on Machine Translation ; https://hal.inria.fr/hal-02986330 ; 4th Conference on Machine Translation, 2019, Florence, Italy (2019)
|
|
BASE
|
|
Show details
|
|
|
|