1 |
Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages
|
|
|
|
In: Workshop on Computational Research in Phonetics, Phonology, and Morphology ; https://hal.archives-ouvertes.fr/hal-01910757 ; Workshop on Computational Research in Phonetics, Phonology, and Morphology, Oct 2018, Bruxelles, Belgium. pp.32 - 42, ⟨10.18653/v1/P17⟩ (2018)
|
|
BASE
|
|
Show details
|
|
2 |
Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville)
|
|
|
|
In: 11th edition of the Language Resources and Evaluation Conference (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-01710043 ; 11th edition of the Language Resources and Evaluation Conference (LREC 2018), ELRA, May 2018, Miyazaki, Japan (2018)
|
|
BASE
|
|
Show details
|
|
3 |
A corpus based study of morpheme deletion in a low resourced language: A case study for Embosi
|
|
|
|
In: Annual Meeting of the Linguistic Society of America ; https://hal.archives-ouvertes.fr/hal-01837164 ; Annual Meeting of the Linguistic Society of America, Jan 2018, Salt Lake City, United States (2018)
|
|
BASE
|
|
Show details
|
|
4 |
Developing an Embosi (Bantu C25) Speech Variant Dictionary to Model Vowel Elision and Morpheme Deletion
|
|
|
|
In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-01837178 ; Annual Conference of the International Speech Communication Association , ISCA, Aug 2017, Stockholm, Sweden (2017)
|
|
BASE
|
|
Show details
|
|
5 |
Corpus base linguistic exploration via forced alignments with a ‘light-weight’ ASR tool
|
|
|
|
In: Language & Technology Conference : Human Language Technologies as a Challenge for Computer Science and Linguistics ; https://hal.archives-ouvertes.fr/hal-01837174 ; Language & Technology Conference : Human Language Technologies as a Challenge for Computer Science and Linguistics, Nov 2017, Poznań, Poland (2017)
|
|
BASE
|
|
Show details
|
|
6 |
LIG-AIKUMA: a Mobile App to Collect Parallel Speech for Under-Resourced Language Studies
|
|
|
|
In: Interspeech 2016 proceedings ; Interspeech 2016 (short demo paper) ; https://hal.archives-ouvertes.fr/hal-01350062 ; Interspeech 2016 (short demo paper), Sep 2016, San-Francisco, France (2016)
|
|
BASE
|
|
Show details
|
|
7 |
BULB: Breaking the Unwritten Language Barrier
|
|
|
|
In: Procedia Computer Science ; Computational Methods for Endangered Language Documentation and Description ; https://hal.archives-ouvertes.fr/hal-01836496 ; Computational Methods for Endangered Language Documentation and Description, May 2016, Yogyakarta, Indonesia. pp.8-14, ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
8 |
Breaking the unwritten language barrier: the BULB project
|
|
|
|
In: SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages ; https://halshs.archives-ouvertes.fr/halshs-01428027 ; SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, May 2016, Yogyakarta, Indonesia. ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
9 |
Preliminary Experiments on Unsupervised Word Discovery in Mboshi
|
|
|
|
In: Interspeech 2016 proceedings ; Interspeech 2016 ; https://hal.archives-ouvertes.fr/hal-01350119 ; Interspeech 2016, Sep 2016, San-Francisco, United States (2016)
|
|
BASE
|
|
Show details
|
|
10 |
Innovative technologies for under-resourced language documentation: The BULB Project
|
|
|
|
In: CCURL proceedings ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC ; https://hal.archives-ouvertes.fr/hal-01350124 ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC, May 2016, Portoroz, Slovenia (2016)
|
|
BASE
|
|
Show details
|
|
11 |
Breaking the unwritten language barrier: the BULB project
|
|
|
|
In: SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages ; https://halshs.archives-ouvertes.fr/halshs-01428027 ; SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, May 2016, Yogyakarta, Indonesia. ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
12 |
Innovative technologies for under-resourced language documentation: The BULB Project
|
|
|
|
In: CCURL proceedings ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC ; https://hal.archives-ouvertes.fr/hal-01350124 ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC, May 2016, Portoroz, Slovenia (2016)
|
|
BASE
|
|
Show details
|
|
13 |
BULB: Breaking the Unwritten Language Barrier
|
|
|
|
In: Procedia Computer Science ; Computational Methods for Endangered Language Documentation and Description ; https://hal.archives-ouvertes.fr/hal-01836496 ; Computational Methods for Endangered Language Documentation and Description, May 2016, Yogyakarta, Indonesia. pp.8-14, ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
14 |
Automatic language identity tagging on word and sentence-level in multilingual text sources: a case-study on Luxembourgish
|
|
|
|
In: International Conference on Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-01843401 ; International Conference on Language Resources and Evaluation, May 2014, Reykjavik, Iceland (2014)
|
|
BASE
|
|
Show details
|
|
15 |
Automatic Language Identity Tagging on Word and Sentence-Level in Multilingual Text Sources: a Case-Study on Luxembourgish
|
|
|
|
In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) ; Ninth International Conference on Language Resources and Evaluation (LREC'14) ; https://hal.archives-ouvertes.fr/hal-01134776 ; Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), May 2014, Reykjavik, Iceland. pp.3300-3304 ; http://lrec2014.lrec-conf.org/en/ (2014)
|
|
BASE
|
|
Show details
|
|
16 |
Modélisation acoustico-phonétique de langues peu dotées : Études phonétiques et travaux de reconnaissance automatique en luxembourgois
|
|
|
|
In: Journées d'Etude sur la Parole ; https://hal.archives-ouvertes.fr/hal-01843399 ; Journées d'Etude sur la Parole, Jan 2014, Le Mans, France (2014)
|
|
BASE
|
|
Show details
|
|
17 |
Speech Alignment and Recognition Experiments for Luxembourgish
|
|
|
|
In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Underresourced Languages ; 4th International Workshop on Spoken Language Technologies for Underresourced Languages ; https://hal.archives-ouvertes.fr/hal-01134824 ; 4th International Workshop on Spoken Language Technologies for Underresourced Languages, May 2014, Saint-Petersbourg, Russia. pp.53-60 ; http://www.mica.edu.vn/sltu2014/ (2014)
|
|
Abstract:
International audience ; Luxembourgish, embedded in a multilingual context on the divide between Romance and Germanic cultures, remains one of Europe’s under-described languages. In this paper, we propose to study acoustic similarities between Luxembourgish and major contact languages (German, French, English) with the help of automatic speech alignment and recognition systems. Experiments were run using monolingual acoustic models trained on German, French and English together with (i) “multilingual” models trained on pooled speech data from these three languages, or with (ii) native Luxembourgish acoustic models from 1200 hours of untranscribed Luxembourgish audio data using unsupervised methods. We investigated whether Luxembourgish was globally better represented by one of the individual languages, by the multilingual model or by the native (unsupervised) model. While German provides globally the best acoustic match for native Luxembourgish, detailed analyses reveal language-specific preferences, in particular English and Luxembourgish models are preferred on diphthongs. The first ASR results illustrate the accuracy of the various sets of supervised monolingual and multilingual models versus unsupervised Luxembourgish acoustic models. The ASR word error rate is progressively reduced from 60 to 25% on the development data set by unsupervised training of larger context-dependent models on increasing anounts of audio data.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; acoustic modeling; forced alignment; language similarity; languages in contact; large vocabulary speech recognition; Luxembourgish; multilingual models; under-resourced languages; unsupervised training
|
|
URL: https://hal.archives-ouvertes.fr/hal-01134824
|
|
BASE
|
|
Hide details
|
|
18 |
A First LVCSR System for Luxembourgish, a Low-Resourced European Language
|
|
|
|
In: Human Language Technology Challenges for Computer Science and Linguistics ; https://hal.archives-ouvertes.fr/hal-01135103 ; Zygmunt Vetulani; Joseph Mariani. Human Language Technology Challenges for Computer Science and Linguistics, 8387, Springer International Publishing, pp.479-490, 2014, 5th Language and Technology Conference, LTC 2011, Poznań, Poland, November 25--27, 2011, Revised Selected Papers, 978-3-319-08957-7. ⟨10.1007/978-3-319-08958-4_39⟩ (2014)
|
|
BASE
|
|
Show details
|
|
19 |
What we can learn from ASR errors about low-resourced languages: a case- study of Luxembourgish and Austrian
|
|
|
|
In: Errors by Humans and Machines in Multimedia, Multimodal, Multilingual Data Processing ; https://hal.archives-ouvertes.fr/hal-01843440 ; Errors by Humans and Machines in Multimedia, Multimodal, Multilingual Data Processing, Jan 2013, Ermenonville, France (2013)
|
|
BASE
|
|
Show details
|
|
20 |
What we can learn from asr errors about low-resourced languages: a case-study of luxembourgish and austrian
|
|
|
|
In: Errors by Humans and Machines in Multimedia, Multimodal, Multilingual Data Processing (ERRARE 2013) ; https://halshs.archives-ouvertes.fr/halshs-01424902 ; Errors by Humans and Machines in Multimedia, Multimodal, Multilingual Data Processing (ERRARE 2013), Nov 2013, Ermenonville, France (2013)
|
|
BASE
|
|
Show details
|
|
|
|