1 |
Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
|
|
|
|
In: https://halshs.archives-ouvertes.fr/halshs-03647315 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Fine-tuning pre-trained models for Automatic Speech Recognition: experiments on a fieldwork corpus of Japhug (Trans-Himalayan family)
|
|
Guillaume, Séverine; Wisniewski, Guillaume; Macaire, Cécile; Jacques, Guillaume; Michaud, Alexis; Galliot, Benjamin; Coavoux, Maximin; Rossato, Solange; Nguyễn, Minh-Châu; Fily, Maxime
|
|
In: https://halshs.archives-ouvertes.fr/halshs-03647315 ; 2022 (2022)
|
|
Abstract:
Accepted for publication in Proceedings of ComputEL-5: Fifth Workshop on the Use of Computational Methods in the Study of Endangered Languages ; This is a report on results obtained in the development of speech recognition tools intended to support linguistic documentation efforts. The test case is an extensive fieldwork corpus of Japhug, an endangered language of the Trans-Himalayan (Sino-Tibetan) family. The goal is to reduce the transcription workload of field linguists. The method used is a deep learning approach based on the language-specific tuning of a generic pre-trained representation model, XLS-R, using a Transformer architecture. We note difficulties in implementation, in terms of learning stability. But this approach brings significant improvements nonetheless. The quality of phonemic transcription is improved over earlier experiments; and most significantly, the new approach allows for reaching the stage of automatic word recognition. Subjective evaluation of the tool by the author of the training data confirms the usefulness of this approach.
|
|
Keyword:
[SHS.LANGUE]Humanities and Social Sciences/Linguistics; Automatic Speech Recognition
|
|
URL: https://halshs.archives-ouvertes.fr/halshs-03647315/file/ComputEL_5_Japhug_ASR.pdf https://halshs.archives-ouvertes.fr/halshs-03647315 https://halshs.archives-ouvertes.fr/halshs-03647315/document
|
|
BASE
|
|
Hide details
|
|
3 |
User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis
|
|
|
|
In: ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages ; https://halshs.archives-ouvertes.fr/halshs-03030529 ; ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages, Mar 2021, Hawai‘i, United States (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Yongning Na for Natural Language Processing: a single-speaker audio corpus with transcriptions ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Yongning Na for Natural Language Processing: a single-speaker audio corpus with transcriptions ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Japhug for Natural Language Processing: a single-speaker audio corpus with transcriptions ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Japhug for Natural Language Processing: a single-speaker audio corpus with transcriptions ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis
|
|
|
|
In: ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages ; https://halshs.archives-ouvertes.fr/halshs-03030529 ; ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages, Mar 2021, Hawai‘i, United States (2021)
|
|
BASE
|
|
Show details
|
|
9 |
La transcription du linguiste au miroir de l’intelligence artificielle : réflexions à partir de la transcription phonémique automatique
|
|
|
|
In: ISSN: 0037-9069 ; EISSN: 1783-1385 ; Bulletin de la Société de Linguistique de Paris ; https://halshs.archives-ouvertes.fr/halshs-02881731 ; Bulletin de la Société de Linguistique de Paris, Peeters Publishers, 2020, 116 (1) (2020)
|
|
BASE
|
|
Show details
|
|
10 |
Ouvrir aux linguistes « de terrain » un accès à la transcription automatique
|
|
|
|
In: Actes des 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT). ; 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT) ; https://hal.archives-ouvertes.fr/hal-03047148 ; 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT), 2020, Montrouge, France. pp.83-94 (2020)
|
|
BASE
|
|
Show details
|
|
11 |
User-friendly automatic transcription of low-resource languages: Plugging ESPnet into Elpis
|
|
|
|
In: ComputEL-4: Fourth Workshop on the Use of Computational Methods in the Study of Endangered Languages ; https://halshs.archives-ouvertes.fr/halshs-03030529 ; 2020 ; https://computel-workshop.org/ (2020)
|
|
BASE
|
|
Show details
|
|
12 |
Phonemic transcription of low-resource languages: To what extent can preprocessing be automated?
|
|
|
|
In: 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop ; https://halshs.archives-ouvertes.fr/hal-02513914 ; 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop, 2020, Marseille, France. pp.306-315 ; https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/SLTUCCURLbook.pdf (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Ouvrir aux linguistes « de terrain » un accès à la transcription automatique
|
|
|
|
In: Actes des 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT). ; 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT) ; https://hal.archives-ouvertes.fr/hal-03047148 ; 2èmes journées scientifiques du Groupement de Recherche Linguistique Informatique Formelle et de Terrain (LIFT), 2020, Montrouge, France. pp.83-94 (2020)
|
|
BASE
|
|
Show details
|
|
14 |
Phonemic transcription of low-resource languages: To what extent can preprocessing be automated?
|
|
|
|
In: 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop ; https://halshs.archives-ouvertes.fr/hal-02513914 ; 1st Joint SLTU (Spoken Language Technologies for Under-resourced languages) and CCURL (Collaboration and Computing for Under-Resourced Languages) Workshop, 2020, Marseille, France. pp.306-315 ; https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/SLTUCCURLbook.pdf (2020)
|
|
BASE
|
|
Show details
|
|
15 |
La transcription du linguiste au miroir de l’intelligence artificielle : réflexions à partir de la transcription phonémique automatique
|
|
|
|
In: ISSN: 0037-9069 ; EISSN: 1783-1385 ; Bulletin de la Société de Linguistique de Paris ; https://halshs.archives-ouvertes.fr/halshs-02881731 ; Bulletin de la Société de Linguistique de Paris, Peeters Publishers, 2020, 116 (1) (2020)
|
|
BASE
|
|
Show details
|
|
16 |
Phonetic lessons from automatic phonemic transcription: preliminary reflections on Na (Sino-Tibetan) and Tsuut’ina (Dene) data
|
|
|
|
In: ICPhS XIX (19th International Congress of Phonetic Sciences) ; https://halshs.archives-ouvertes.fr/halshs-02059313 ; ICPhS XIX (19th International Congress of Phonetic Sciences), Aug 2019, Melbourne, Australia ; https://icphs2019.org/icphs2019-fullpapers/ (2019)
|
|
BASE
|
|
Show details
|
|
17 |
La Collection Pangloss : s’ouvrir à la science… et au reste du monde
|
|
|
|
In: Symposium 2019 du Laboratoire d'excellence "Fondements empiriques de la linguistique / Empirical Foundations of Linguistics" (Labex EFL) ; https://halshs.archives-ouvertes.fr/halshs-02156809 ; Symposium 2019 du Laboratoire d'excellence "Fondements empiriques de la linguistique / Empirical Foundations of Linguistics" (Labex EFL), Jun 2019, Paris, France. 2019 (2019)
|
|
BASE
|
|
Show details
|
|
18 |
Phonetic lessons from automatic phonemic transcription: preliminary reflections on Na (Sino-Tibetan) and Tsuut’ina (Dene) data
|
|
|
|
In: ICPhS XIX (19th International Congress of Phonetic Sciences) ; https://halshs.archives-ouvertes.fr/halshs-02059313 ; ICPhS XIX (19th International Congress of Phonetic Sciences), Aug 2019, Melbourne, Australia ; https://icphs2019.org/icphs2019-fullpapers/ (2019)
|
|
BASE
|
|
Show details
|
|
19 |
La Collection Pangloss : s’ouvrir à la science… et au reste du monde
|
|
|
|
In: Symposium 2019 du Laboratoire d'excellence "Fondements empiriques de la linguistique / Empirical Foundations of Linguistics" (Labex EFL) ; https://halshs.archives-ouvertes.fr/halshs-02156809 ; Symposium 2019 du Laboratoire d'excellence "Fondements empiriques de la linguistique / Empirical Foundations of Linguistics" (Labex EFL), Jun 2019, Paris, France. 2019 (2019)
|
|
BASE
|
|
Show details
|
|
20 |
Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit
|
|
|
|
In: ISSN: 1934-5275 ; EISSN: 1934-5275 ; Language Documentation & Conservation ; https://halshs.archives-ouvertes.fr/halshs-01841979 ; Language Documentation & Conservation, University of Hawaiʻi Press 2018, 12, pp.393-429 ; hdl.handle.net/10125/24793 (2018)
|
|
BASE
|
|
Show details
|
|
|
|