1 |
Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
|
|
Guillaume, Séverine; Wisniewski, Guillaume; Macaire, Cécile; Jacques, Guillaume; Michaud, Alexis; Galliot, Benjamin; Coavoux, Maximin; Rossato, Solange; Nguyễn, Minh-Châu; Fily, Maxime
|
|
In: https://halshs.archives-ouvertes.fr/halshs-03647315 ; 2022 (2022)
|
|
Abstract:
Accepted for publication in Proceedings of ComputEL-5: Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages ; This is a report on results obtained in the development of speech recognition tools intended to support linguistic documentation efforts. The test case is an extensive fieldwork corpus of Japhug, an endangered language of the Trans-Himalayan (Sino-Tibetan) family. The goal is to reduce the transcription workload of field linguists. The method used is a deep learning approach based on the language-specific tuning of a generic pre-trained representation model, XLS-R, using a Transformer architecture. We note difficulties in implementation, in terms of learning stability. But this approach brings significant improvements nonetheless. The quality of phonemic transcription is improved over earlier experiments; and most significantly, the new approach allows for reaching the stage of automatic word recognition. Subjective evaluation of the tool by the author of the training data confirms the usefulness of this approach.
|
|
Keyword:
[SHS.LANGUE]Humanities and Social Sciences/Linguistics; Automatic Speech Recognition
|
|
URL: https://halshs.archives-ouvertes.fr/halshs-03647315/file/ComputEL_5_Japhug_ASR.pdf https://halshs.archives-ouvertes.fr/halshs-03647315/document https://halshs.archives-ouvertes.fr/halshs-03647315
|
|
BASE
|
|
Hide details
|
|
2 |
Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
|
|
|
|
In: https://halshs.archives-ouvertes.fr/halshs-03647315 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Are Neural Networks Extracting Linguistic Properties or Memorizing Training Data? An Observation with a Multilingual Probe for Predicting Tense
|
|
|
|
In: EACL 2021 ; https://halshs.archives-ouvertes.fr/halshs-03197072 ; EACL 2021, Apr 2021, Kiev (on line), Ukraine (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Gender Bias in Neural Translation: a preliminary study ; Biais de genre dans un système de traduction automatique neuronale : une étude préliminaire
|
|
|
|
In: Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale ; Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-03265895 ; Traitement Automatique des Langues Naturelles, 2021, Lille, France. pp.11-25 ; https://talnrecital2021.inria.fr/ (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Screening Gender Transfer in Neural Machine Translation
|
|
|
|
In: Proceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, ; Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP ; https://hal.archives-ouvertes.fr/hal-03424174 ; Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP, Association for computational linguistics, Nov 2021, Punta Cana, Dominica ; https://blackboxnlp.github.io/ (2021)
|
|
BASE
|
|
Show details
|
|
6 |
The SPECTRANS System Description for the WMT21 Terminology Task
|
|
|
|
In: Proceedings of the Sixth Conference on Machine Translation ; EMNLP 2021 SIXTH CONFERENCE ON MACHINE TRANSLATION (WMT21) ; https://hal.archives-ouvertes.fr/hal-03574680 ; EMNLP 2021 SIXTH CONFERENCE ON MACHINE TRANSLATION (WMT21), ACL, Nov 2021, Punta Cana, Dominican Republic. pp.815-820 ; https://aclanthology.org/events/wmt-2021/ (2021)
|
|
BASE
|
|
Show details
|
|
7 |
User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis
|
|
|
|
In: ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages ; https://halshs.archives-ouvertes.fr/halshs-03030529 ; ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages, Mar 2021, Hawai‘i, United States (2021)
|
|
BASE
|
|
Show details
|
|
8 |
Noisy UGC Translation at the Character Level: Revisiting Open-Vocabulary Capabilities and Robustness of Char-Based Models
|
|
|
|
In: W-NUT 2021 - 7th Workshop on Noisy User-generated Text (colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03540174 ; W-NUT 2021 - 7th Workshop on Noisy User-generated Text (colocated with EMNLP 2021), Association for computational linguistics, Nov 2021, Punta Cana, Dominican Republic (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Understanding the Impact of UGC Specificities on Translation Quality
|
|
|
|
In: W-NUT 2021 - Seventh Workshop on Noisy User-generated Text (colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03540175 ; W-NUT 2021 - Seventh Workshop on Noisy User-generated Text (colocated with EMNLP 2021), association for computational linguistics, Nov 2021, Punta Cana, Dominican Republic (2021)
|
|
BASE
|
|
Show details
|
|
10 |
Are Transformers a Modern Version of ELIZA? Observations on French Object Verb Agreement ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis
|
|
|
|
In: ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages ; https://halshs.archives-ouvertes.fr/halshs-03030529 ; ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages, Mar 2021, Hawai‘i, United States (2021)
|
|
BASE
|
|
Show details
|
|
12 |
La transcription du linguiste au miroir de l’intelligence artificielle : réflexions à partir de la transcription phonémique automatique
|
|
|
|
In: ISSN: 0037-9069 ; EISSN: 1783-1385 ; Bulletin de la Société de Linguistique de Paris ; https://halshs.archives-ouvertes.fr/halshs-02881731 ; Bulletin de la Société de Linguistique de Paris, Peeters Publishers, 2020, 116 (1) (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Ouvrir aux linguistes « de terrain » un accès à la transcription automatique
|
|
|
|
In: Actes des 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT). ; 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT) ; https://hal.archives-ouvertes.fr/hal-03047148 ; 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT), 2020, Montrouge, France. pp.83-94 (2020)
|
|
BASE
|
|
Show details
|
|
14 |
User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis
|
|
|
|
In: ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages ; https://halshs.archives-ouvertes.fr/halshs-03030529 ; 2020 ; https://computel-workshop.org/ (2020)
|
|
BASE
|
|
Show details
|
|
15 |
Phonemic transcription of low-resource languages: To what extent can preprocessing be automated?
|
|
|
|
In: 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop ; https://halshs.archives-ouvertes.fr/hal-02513914 ; 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop, 2020, Marseille, France. pp.306-315 ; https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/SLTUCCURLbook.pdf (2020)
|
|
BASE
|
|
Show details
|
|
16 |
Ouvrir aux linguistes « de terrain » un accès à la transcription automatique
|
|
|
|
In: Actes des 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT). ; 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT) ; https://hal.archives-ouvertes.fr/hal-03047148 ; 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT), 2020, Montrouge, France. pp.83-94 (2020)
|
|
BASE
|
|
Show details
|
|
17 |
Phonemic transcription of low-resource languages: To what extent can preprocessing be automated?
|
|
|
|
In: 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop ; https://halshs.archives-ouvertes.fr/hal-02513914 ; 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop, 2020, Marseille, France. pp.306-315 ; https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/SLTUCCURLbook.pdf (2020)
|
|
BASE
|
|
Show details
|
|
18 |
La transcription du linguiste au miroir de l’intelligence artificielle : réflexions à partir de la transcription phonémique automatique
|
|
|
|
In: ISSN: 0037-9069 ; EISSN: 1783-1385 ; Bulletin de la Société de Linguistique de Paris ; https://halshs.archives-ouvertes.fr/halshs-02881731 ; Bulletin de la Société de Linguistique de Paris, Peeters Publishers, 2020, 116 (1) (2020)
|
|
BASE
|
|
Show details
|
|
19 |
How Bad are PoS Tagger in Cross-Corpora Settings? Evaluating Annotation Divergence in the UD Project
|
|
|
|
In: Proceedings of NAACL-HLT 2019, ; 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ; https://hal.archives-ouvertes.fr/hal-02055137 ; 2019 Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Jun 2019, Minneapolis, Minnesota, United States. pp.218 - 227 (2019)
|
|
BASE
|
|
Show details
|
|
20 |
A Comparison between NMT and PBSMT Performance for Translating Noisy User-Generated Content
|
|
|
|
In: The 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19) ; https://hal.archives-ouvertes.fr/hal-02270524 ; The 22nd Nordic Conference on Computational Linguistics (NoDaLiDa’19), Sep 2019, Turku, Finland ; https://nodalida2019.org/index.html (2019)
|
|
BASE
|
|
Show details
|
|
|
|