1 |
MasakhaNER: Named entity recognition for African languages
|
|
|
|
In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03350962 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2021, ⟨10.1162/tacl⟩ (2021)
|
|
BASE
|
|
Show details
|
|
2 |
Speech technology for unwritten languages
|
|
|
|
In: ISSN: 2329-9290 ; EISSN: 2329-9304 ; IEEE/ACM Transactions on Audio, Speech and Language Processing ; https://hal.inria.fr/hal-02480675 ; IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2020, ⟨10.1109/TASLP.2020.2973896⟩ (2020)
|
|
BASE
|
|
Show details
|
|
3 |
AlloVera: a multilingual allophone database
|
|
|
|
In: LREC 2020: 12th Language Resources and Evaluation Conference ; https://halshs.archives-ouvertes.fr/halshs-02527046 ; LREC 2020: 12th Language Resources and Evaluation Conference, European Language Resources Association, May 2020, Marseille, France ; https://lrec2020.lrec-conf.org/ (2020)
|
|
BASE
|
|
Show details
|
|
4 |
AlloVera: a multilingual allophone database
|
|
|
|
In: LREC 2020: 12th Language Resources and Evaluation Conference ; https://halshs.archives-ouvertes.fr/halshs-02527046 ; LREC 2020: 12th Language Resources and Evaluation Conference, European Language Resources Association, May 2020, Marseille, France ; https://lrec2020.lrec-conf.org/ (2020)
|
|
BASE
|
|
Show details
|
|
5 |
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the “Speaking rosetta” JSALT 2017 workshop
|
|
|
|
In: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01709578 ; ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada (2018)
|
|
BASE
|
|
Show details
|
|
6 |
Evaluating phonemic transcription of low-resource tonal languages for language documentation
|
|
|
|
In: LREC 2018 (Language Resources and Evaluation Conference) ; https://halshs.archives-ouvertes.fr/halshs-01709648 ; LREC 2018 (Language Resources and Evaluation Conference), May 2018, Miyazaki, Japan. pp.3356-3365 (2018)
|
|
BASE
|
|
Show details
|
|
7 |
Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit
|
|
|
|
In: ISSN: 1934-5275 ; EISSN: 1934-5275 ; Language Documentation & Conservation ; https://halshs.archives-ouvertes.fr/halshs-01841979 ; Language Documentation & Conservation, University of Hawaiʻi Press 2018, 12, pp.393-429 ; hdl.handle.net/10125/24793 (2018)
|
|
Abstract:
International audience ; Automatic speech recognition tools have potential for facilitating language documentation, but in practice these tools remain little-used by linguists for a variety of reasons, such as that the technology is still new (and evolving rapidly), user-friendly interfaces are still under development, and case studies demonstrating the practical usefulness of automatic recognition in a low-resource setting remain few. This article reports on a success story in integrating automatic transcription into the language documentation workflow, specifically for Yongning Na, a language of Southwest China. Using PERSEPHONE, an open-source toolkit, a single-speaker speech transcription tool was trained over five hours of manually transcribed speech. The experiments found that this method can achieve a remarkably low error rate (on the order of 17%), and that automatic transcriptions were useful as a canvas for the linguist. The present report is intended for linguists with little or no knowledge of speech processing. It aims to provide insights into (i) the way the tool operates and (ii) the process of collaborating with natural language processing specialists. Practical recommendations are offered on how to anticipate the requirements of this type of technology from the early stages of data collection in the field.
|
|
Keyword:
[SHS.LANGUE]Humanities and Social Sciences/Linguistics; automatic speech recognition; automatic speech transcription; endangered languages; interdisciplinarity; language documentation; multimedia corpora; natural language processing; open access; open-source software; sound archive
|
|
URL: https://halshs.archives-ouvertes.fr/halshs-01841979v2/document https://halshs.archives-ouvertes.fr/halshs-01841979v2/file/AutomaticTranscription_Persephone_LDC2018.pdf https://halshs.archives-ouvertes.fr/halshs-01841979
|
|
BASE
|
|
Hide details
|
|
8 |
Integrating automatic transcription into the language documentation workflow: Experiments with Na data and the Persephone toolkit
|
|
|
|
In: ISSN: 1934-5275 ; EISSN: 1934-5275 ; Language Documentation & Conservation ; https://halshs.archives-ouvertes.fr/halshs-01841979 ; Language Documentation & Conservation, University of Hawaiʻi Press 2018, 12, pp.393-429 ; hdl.handle.net/10125/24793 (2018)
|
|
BASE
|
|
Show details
|
|
9 |
Evaluating phonemic transcription of low-resource tonal languages for language documentation
|
|
|
|
In: LREC 2018 (Language Resources and Evaluation Conference) ; https://halshs.archives-ouvertes.fr/halshs-01709648 ; LREC 2018 (Language Resources and Evaluation Conference), May 2018, Miyazaki, Japan. pp.3356-3365 (2018)
|
|
BASE
|
|
Show details
|
|
10 |
Phonemic transcription of low-resource tonal languages
|
|
|
|
In: ISSN: 1834-7037 ; Australasian Language Technology Association Workshop 2017 ; https://halshs.archives-ouvertes.fr/halshs-01656683 ; Australasian Language Technology Association Workshop 2017, Dec 2017, Brisbane, Australia. pp.53-60 (2017)
|
|
BASE
|
|
Show details
|
|
11 |
Phonemic transcription of low-resource tonal languages
|
|
|
|
In: ISSN: 1834-7037 ; Australasian Language Technology Association Workshop 2017 ; https://halshs.archives-ouvertes.fr/halshs-01656683 ; Australasian Language Technology Association Workshop 2017, Dec 2017, Brisbane, Australia. pp.53-60 (2017)
|
|
BASE
|
|
Show details
|
|
|
|