1 |
Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
|
|
Guillaume, Séverine; Wisniewski, Guillaume; Macaire, Cécile; Jacques, Guillaume; Michaud, Alexis; Galliot, Benjamin; Coavoux, Maximin; Rossato, Solange; Nguyễn, Minh-Châu; Fily, Maxime
|
|
In: https://halshs.archives-ouvertes.fr/halshs-03647315 ; 2022 (2022)
|
|
Abstract:
Accepted for publication in Proceedings of ComputEL-5: Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages ; This is a report on results obtained in the development of speech recognition tools intended to support linguistic documentation efforts. The test case is an extensive fieldwork corpus of Japhug, an endangered language of the Trans-Himalayan (Sino-Tibetan) family. The goal is to reduce the transcription workload of field linguists. The method used is a deep learning approach based on the language-specific tuning of a generic pre-trained representation model, XLS-R, using a Transformer architecture. We note difficulties in implementation, in terms of learning stability. But this approach brings significant improvements nonetheless. The quality of phonemic transcription is improved over earlier experiments; and most significantly, the new approach allows for reaching the stage of automatic word recognition. Subjective evaluation of the tool by the author of the training data confirms the usefulness of this approach.
|
|
Keyword:
[SHS.LANGUE]Humanities and Social Sciences/Linguistics; Automatic Speech Recognition
|
|
URL: https://halshs.archives-ouvertes.fr/halshs-03647315/file/ComputEL_5_Japhug_ASR.pdf https://halshs.archives-ouvertes.fr/halshs-03647315/document https://halshs.archives-ouvertes.fr/halshs-03647315
|
|
BASE
|
|
Hide details
|
|
6 |
Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
|
|
|
|
In: https://halshs.archives-ouvertes.fr/halshs-03647315 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
7 |
Contribution d'informations syntaxiques aux capacités de généralisation compositionelle des modèles seq2seq convolutifs
|
|
|
|
In: Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale ; Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-03265890 ; Traitement Automatique des Langues Naturelles, 2021, Lille, France. pp.134-141 (2021)
|
|
BASE
|
|
Show details
|
|
8 |
BERT-Proof Syntactic Structures: Investigating Errors in Discontinuous Constituency Parsing
|
|
|
|
In: Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021) ; https://hal.archives-ouvertes.fr/hal-03339847 ; Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (ACL-IJCNLP 2021), Association for Computational Linguistics, Aug 2021, Online, France. pp.3259-3272, ⟨10.18653/v1/2021.findings-acl.288⟩ ; https://2021.aclweb.org/ (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Self-Supervised and Controlled Multi-Document Opinion Summarization
|
|
|
|
In: Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume ; https://hal.archives-ouvertes.fr/hal-03241932 ; Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Apr 2021, Online, Unknown Region. pp.1646--1662 (2021)
|
|
BASE
|
|
Show details
|
|
10 |
FlauBERT: Unsupervised Language Model Pre-training for French
|
|
|
|
In: Proceedings of the 12th Language Resources and Evaluation Conference ; LREC ; https://hal.archives-ouvertes.fr/hal-02890258 ; LREC, 2020, Marseille, France (2020)
|
|
BASE
|
|
Show details
|
|
11 |
FlauBERT : Unsupervised Language Model Pre-training for French ; FlauBERT : des modèles de langue contextualisés pré-entraînés pour le français
|
|
|
|
In: Actes de la 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles ; 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-02784776 ; 6e conférence conjointe Journées d'Études sur la Parole (JEP, 33e édition), Traitement Automatique des Langues Naturelles (TALN, 27e édition), Rencontre des Étudiants Chercheurs en Informatique pour le Traitement Automatique des Langues (RÉCITAL, 22e édition). Volume 2 : Traitement Automatique des Langues Naturelles, Jun 2020, Nancy, France. pp.268-278 (2020)
|
|
BASE
|
|
Show details
|
|
12 |
Unlexicalized Transition-based Discontinuous Constituency Parsing
|
|
|
|
In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-02150073 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2019, 7, pp.73--89. ⟨10.1162/tacl_a_00255⟩ (2019)
|
|
BASE
|
|
Show details
|
|
13 |
Discontinuous Constituency Parsing with a Stack-Free Transition System and a Dynamic Oracle
|
|
|
|
In: 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019) ; https://hal.archives-ouvertes.fr/hal-02150076 ; 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics (NAACL-HLT 2019), Jun 2019, Minneapolis, MN, United States. pp.204--217 ; https://naacl2019.org/ (2019)
|
|
BASE
|
|
Show details
|
|
14 |
Unlexicalized Transition-based Discontinuous Constituency Parsing ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Unlexicalized Transition-based Discontinuous Constituency Parsing
|
|
|
|
In: Transactions of the Association for Computational Linguistics, Vol 7, Pp 73-89 (2019) (2019)
|
|
BASE
|
|
Show details
|
|
16 |
Privacy-preserving Neural Representations of Text
|
|
|
|
In: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing ; 2018 Conference on Empirical Methods in Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-02135081 ; 2018 Conference on Empirical Methods in Natural Language Processing, Nov 2018, Brussels, Belgium. pp.1--10 (2018)
|
|
BASE
|
|
Show details
|
|
18 |
Neural Greedy Constituent Parsing with Dynamic Oracles
|
|
|
|
In: Association for Computational Linguistics (ACL) ; https://hal.inria.fr/hal-01353734 ; Association for Computational Linguistics (ACL), 2016, Berlin, Germany (2016)
|
|
BASE
|
|
Show details
|
|
|
|