61 |
A phonologically weak contrast can induce phonetic overlap
|
|
|
|
In: Laboratory Phonology Conference ; https://hal.archives-ouvertes.fr/hal-01837204 ; Laboratory Phonology Conference, Jul 2016, Ithaca, United States (2016)
|
|
BASE
|
|
Show details
|
|
62 |
Breaking the unwritten language barrier: the BULB project
|
|
|
|
In: SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages ; https://halshs.archives-ouvertes.fr/halshs-01428027 ; SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, May 2016, Yogyakarta, Indonesia. ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
63 |
Innovative technologies for under-resourced language documentation: The BULB Project
|
|
|
|
In: CCURL proceedings ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC ; https://hal.archives-ouvertes.fr/hal-01350124 ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC, May 2016, Portoroz, Slovenia (2016)
|
|
BASE
|
|
Show details
|
|
65 |
Machine Translation Based Data Augmentation for Cantonese Keyword Spotting (Author's Manuscript)
|
|
|
|
BASE
|
|
Show details
|
|
66 |
Investigating Techniques for Low Resource Conversational Speech Recognition
|
|
|
|
BASE
|
|
Show details
|
|
67 |
Breaking the unwritten language barrier: the BULB project
|
|
|
|
In: SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages ; https://halshs.archives-ouvertes.fr/halshs-01428027 ; SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, May 2016, Yogyakarta, Indonesia. ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
68 |
Innovative technologies for under-resourced language documentation: The BULB Project
|
|
Lamel, Lori; Makasso, Emmanuel-Moselly; Rialland, Annie; Yvon, François; Besacier, Laurent; Gauthier, Elodie; Blachon, David; Van De Velde, Mark; Godard, Pierre; Ene Bonneau-Maynard, Héì; Stuker, Sebastian; Hamlaoui, Fatima; Ambouroue, Odette; Adda-Decker, Martine; Zerbian, Sabine; Kouarata, Guy-Noël; Adda, Gilles; Idiatov, Dmitry
|
|
In: CCURL proceedings ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC ; https://hal.archives-ouvertes.fr/hal-01350124 ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC, May 2016, Portoroz, Slovenia (2016)
|
|
Abstract:
International audience ; The project Breaking the Unwritten Language Barrier (BULB), which brings together linguists and computer scientists, aims at supporting linguists in documenting unwritten languages. In order to achieve this we will develop tools tailored to the needs of documentary linguists by building upon technology and expertise from the area of natural language processing, most prominently automatic speech recognition and machine translation. As a development and test bed for this we have chosen three less-resourced African languages from the Bantu family: Basaa, Myene and Embosi. Work within the project is divided into three main steps: 1) Collection of a large corpus of speech (100h per language) at a reasonable cost. After initial recording, the data is re-spoken by a reference speaker to enhance the signal quality and orally translated into French. 2) Automatic transcription of the Bantu languages at phoneme level and the French translation at word level. The recognized Bantu phonemes and French words will then be automatically aligned. 3) Tool development. In close cooperation and discussion with the linguists, the speech and language technologists will design and implement tools that will support the linguists in their work, taking into account the linguists' needs and technology's capabilities. The data collection has begun for the three languages. For this we use standard mobile devices and a dedicated software—LIG-AIKUMA, which proposes a range of different speech collection modes (recording, respeaking, translation and elicitation). LIG-AIKUMA 's improved features include a smart generation and handling of speaker metadata as well as respeaking and parallel audio data mapping.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; automatic alignment; automatic phonetic transcription; Language documentation; unwritten languages
|
|
URL: https://hal.archives-ouvertes.fr/hal-01350124 https://hal.archives-ouvertes.fr/hal-01350124/document https://hal.archives-ouvertes.fr/hal-01350124/file/CCURL_BULB_2016.pdf
|
|
BASE
|
|
Hide details
|
|
69 |
BULB: Breaking the Unwritten Language Barrier
|
|
|
|
In: Procedia Computer Science ; Computational Methods for Endangered Language Documentation and Description ; https://hal.archives-ouvertes.fr/hal-01836496 ; Computational Methods for Endangered Language Documentation and Description, May 2016, Yogyakarta, Indonesia. pp.8-14, ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
70 |
Analysing rhythm in ritual discourse in Yucatec Maya using automatic speech alignment
|
|
|
|
In: Interspeech 2015 Speech beyond speech ; https://halshs.archives-ouvertes.fr/halshs-01250490 ; Interspeech 2015 Speech beyond speech, Sep 2015, Dresden, Germany ; http://interspeech2015.org/ (2015)
|
|
BASE
|
|
Show details
|
|
71 |
Dropping of the Class-Prefix Consonant, Vowel Elision and Automatic Phonological Mining in Embosi (Bantu C 25)
|
|
|
|
In: ISSN: 9781574734652 ; Selected Proceedings of the 44th Annual Conference on African Linguistics ; https://halshs.archives-ouvertes.fr/halshs-01251202 ; Selected Proceedings of the 44th Annual Conference on African Linguistics, Ruth Kramer, Elizabeth C. Zsiga, and One Tlale Boyer, Cascadilla Proceedings Project, 2015, pp. 221-230 (2015)
|
|
BASE
|
|
Show details
|
|
72 |
Traduction de la parole dans le projet RAPMAT
|
|
|
|
In: Journées d'Études sur la Parole ; https://hal.archives-ouvertes.fr/hal-01843418 ; Journées d'Études sur la Parole, Jan 2014, Le Mans, France (2014)
|
|
BASE
|
|
Show details
|
|
73 |
Speech-to-Text Development for Slovak, a Low-Resourced Language
|
|
|
|
In: International Workshop on Spoken Languages Technologies for Under-resourced languages ; https://hal.archives-ouvertes.fr/hal-01843417 ; International Workshop on Spoken Languages Technologies for Under-resourced languages, May 2014, St. Petersburg, Russia (2014)
|
|
BASE
|
|
Show details
|
|
74 |
Comparing decoding strategies for subword-based keyword spotting in low-resourced languages
|
|
|
|
In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-01843408 ; Annual Conference of the International Speech Communication Association , ISCA, Sep 2014, Singapore, Singapore (2014)
|
|
BASE
|
|
Show details
|
|
75 |
Analyzing linguistic variation in a Romanian speech corpus through ASR errors
|
|
|
|
In: Laboratory Approaches to Romance Phonology ; https://hal.archives-ouvertes.fr/hal-01843421 ; Laboratory Approaches to Romance Phonology, Laboratoire Parole et Langage (UMR 6057), Aix-en-Provence, Sep 2014, Aix-en-Provence, France (2014)
|
|
BASE
|
|
Show details
|
|
76 |
Efficient Rule Scoring for Improved Grapheme-Based Lexicons
|
|
|
|
In: European Signal Processing Conference ; https://hal.archives-ouvertes.fr/hal-01843411 ; European Signal Processing Conference, Jan 2014, Lisbon, Portugal (2014)
|
|
BASE
|
|
Show details
|
|
77 |
Automatic language identity tagging on word and sentence-level in multilingual text sources: a case-study on Luxembourgish
|
|
|
|
In: International Conference on Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-01843401 ; International Conference on Language Resources and Evaluation, May 2014, Reykjavik, Iceland (2014)
|
|
BASE
|
|
Show details
|
|
78 |
Cross-Word Sub-Word Units for Low-Resource Keyword Spotting
|
|
|
|
In: International Workshop on Spoken Languages Technologies for Under-resourced languages ; https://hal.archives-ouvertes.fr/hal-01843415 ; International Workshop on Spoken Languages Technologies for Under-resourced languages, May 2014, St. Petersburg, Russia (2014)
|
|
BASE
|
|
Show details
|
|
79 |
Automatic Language Identity Tagging on Word and Sentence-Level in Multilingual Text Sources: a Case-Study on Luxembourgish
|
|
|
|
In: Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC'14) ; Ninth International Conference on Language Resources and Evaluation (LREC'14) ; https://hal.archives-ouvertes.fr/hal-01134776 ; Ninth International Conference on Language Resources and Evaluation (LREC'14), European Language Resources Association (ELRA), May 2014, Reykjavik, Iceland. pp.3300-3304 ; http://lrec2014.lrec-conf.org/en/ (2014)
|
|
BASE
|
|
Show details
|
|
80 |
Development of a Korean speech recognition system with little annontated data
|
|
|
|
In: International Workshop on Spoken Languages Technologies for Under-resourced languages ; https://hal.archives-ouvertes.fr/hal-01843405 ; International Workshop on Spoken Languages Technologies for Under-resourced languages, May 2014, St Petersburg, Russia (2014)
|
|
BASE
|
|
Show details
|
|
|
|