4 |
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
|
|
Godard, P.; Adda, G; Adda-Decker, Martine; Benjumea, J; Besacier, Laurent; Cooper-Leavitt, J; Kouarata, G-N; Lamel, L; Maynard, H; Müller, M.; Rialland, A; Stüker, S.; Yvon, F.; Zanon-Boito, M
|
|
In: Language Resources and Evaluation Conference (LREC) ; https://hal.archives-ouvertes.fr/hal-01807093 ; Language Resources and Evaluation Conference (LREC), Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Pi, May 2018, Miyazaki, Japan (2018)
|
|
Abstract:
International audience ; Most speech and language technologies are trained with massive amounts of speech and text information. However, most of the world languages do not have such resources and some even lack a stable orthography. Building systems under these almost zero resource conditions is not only promising for speech technology but also for computational language documentation. The goal of computational language documentation is to help field linguists to (semi-)automatically analyze and annotate audio recordings of endangered, unwritten languages. Example tasks are automatic phoneme discovery or lexicon discovery from the speech signal. This paper presents a speech corpus collected during a realistic language documentation process. It is made up of 5k speech utterances in Mboshi (Bantu C25) aligned to French text translations. Speech transcriptions are also made available: they correspond to a non-standard graphemic form close to the language phonology. We detail how the data was collected, cleaned and processed and we illustrate its use through a zero-resource task: spoken term discovery. The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; field linguistics; language documentation; spoken term discovery; unwritten languages; word segmentation; zero resource technologies
|
|
URL: https://hal.archives-ouvertes.fr/hal-01807093/document https://hal.archives-ouvertes.fr/hal-01807093/file/lrec2018_mboshi_final-3.pdf https://hal.archives-ouvertes.fr/hal-01807093
|
|
BASE
|
|
Hide details
|
|
5 |
Understanding, scripting and staging emotional experiences. ; Understanding, scripting and staging emotional experiences. : On some central research topics on emotion and language.
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-01802931 ; 2018 (2018)
|
|
BASE
|
|
Show details
|
|
6 |
« Ne joue pas avec ton couteau » : la phraséologie des manières de table
|
|
|
|
In: Lexeme, Phraseme, Konstruktionen. Aktuelle Beiträge zu Lexikologie und Phraseologie ; https://hal.archives-ouvertes.fr/hal-02125236 ; Martina Nicklaus; Nora Wirtz; Marcella Costa; Karin Ewert-Kling; Wiebke Vogt. Lexeme, Phraseme, Konstruktionen. Aktuelle Beiträge zu Lexikologie und Phraseologie, Peter Lang, pp.161-182, 2018 (2018)
|
|
BASE
|
|
Show details
|
|
7 |
Understanding, scripting and staging emotional experiences. ; Understanding, scripting and staging emotional experiences. : On some central research topics on emotion and language.
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-01802931 ; 2018 (2018)
|
|
BASE
|
|
Show details
|
|
8 |
Bambine e ragazzi bilingui nelle classi multietniche di Torino
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Measurement of child-directed speech: Bridging the gap between research and practice
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Best practice service delivery for school-aged children with language disorders: What does the evidence say?
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Украинский конфликт в зеркале корпусной лингвистики
|
|
|
|
In: Weiss, Daniel (2018). Украинский конфликт в зеркале корпусной лингвистики. Slavica Helvetica, 86:321-348. (2018)
|
|
BASE
|
|
Show details
|
|
15 |
The Ethnocultural Potential of Voice Forms and Its Discourse Actualization
|
|
|
|
In: Russian journal of linguistics: Vestnik RUDN, Vol 22, Iss 4, Pp 874-894 (2018) (2018)
|
|
BASE
|
|
Show details
|
|
16 |
Le langage spécialisé du domaine médical au Maroc : entre théorie et pratique
|
|
|
|
In: Studii si Cercetari Filologice: Seria Limbi Straine Aplicate, Iss 17, Pp 193-202 (2018) (2018)
|
|
BASE
|
|
Show details
|
|
|
|