DE eng

Search in the Catalogues and Directories

Hits 1 – 12 of 12

1
A Very Low Resource Language Speech Corpus for Computational Language Documentation Experiments
In: Language Resources and Evaluation Conference (LREC) ; https://hal.archives-ouvertes.fr/hal-01807093 ; Language Resources and Evaluation Conference (LREC), Nicoletta Calzolari (Conference chair) and Khalid Choukri and Christopher Cieri and Thierry Declerck and Sara Goggi and Koiti Hasida and Hitoshi Isahara and Bente Maegaard and Joseph Mariani and Hélène Mazo and Asuncion Moreno and Jan Odijk and Stelios Pi, May 2018, Miyazaki, Japan (2018)
Abstract: International audience ; Most speech and language technologies are trained with massive amounts of speech and text information. However, most of the world languages do not have such resources and some even lack a stable orthography. Building systems under these almost zero resource conditions is not only promising for speech technology but also for computational language documentation. The goal of computational language documentation is to help field linguists to (semi-)automatically analyze and annotate audio recordings of endangered, unwritten languages. Example tasks are automatic phoneme discovery or lexicon discovery from the speech signal. This paper presents a speech corpus collected during a realistic language documentation process. It is made up of 5k speech utterances in Mboshi (Bantu C25) aligned to French text translations. Speech transcriptions are also made available: they correspond to a non-standard graphemic form close to the language phonology. We detail how the data was collected, cleaned and processed and we illustrate its use through a zero-resource task: spoken term discovery. The dataset is made available to the community for reproducible computational language documentation experiments and their evaluation.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; field linguistics; language documentation; spoken term discovery; unwritten languages; word segmentation; zero resource technologies
URL: https://hal.archives-ouvertes.fr/hal-01807093/document
https://hal.archives-ouvertes.fr/hal-01807093/file/lrec2018_mboshi_final-3.pdf
https://hal.archives-ouvertes.fr/hal-01807093
BASE
Hide details
2
The Impact of Production Complexity in German L2 by French Native Speakers: Focus on /h/ and Vowel Duration Contrast
In: Phonology in Protolanguage and Interlanguage ; https://halshs.archives-ouvertes.fr/halshs-01737853 ; Elena Babatsouli; David Ingram. Phonology in Protolanguage and Interlanguage, Equinox, pp.255-285, In press (2018)
BASE
Show details
3
Studying Vowel Variation in French-Algerian Arabic Code-switched Speech
In: Interspeech 2018 ; https://halshs.archives-ouvertes.fr/halshs-01969143 ; Interspeech 2018, Sep 2018, Hyderabad,, India. ⟨10.21437/interspeech.2018-2381⟩ (2018)
BASE
Show details
4
Quand les voyelles longues et brèves ne tiennent pas en place : la qualité vocalique en allemand L2
In: XXXIIe Journées d’Études sur la Parole ; https://halshs.archives-ouvertes.fr/halshs-02130881 ; XXXIIe Journées d’Études sur la Parole, Jun 2018, Aix-en-Provence, France. pp.64-71, ⟨10.21437/JEP.2018-8⟩ (2018)
BASE
Show details
5
Studying Vowel Variation in French-Algerian Arabic Code-switched Speech
In: Interspeech 2018 ; https://halshs.archives-ouvertes.fr/halshs-02130906 ; Interspeech 2018, Sep 2018, Hyderabad, India. pp.2753-2757, ⟨10.21437/Interspeech.2018-2381⟩ (2018)
BASE
Show details
6
The French-Algerian Code-Switching Triggered audio corpus (FACST)
In: LREC 2018, Eleventh International Conference on Language Resources and Evaluation ; LREC 2018 11th edition of the Language Resources and Evaluation Conference, ; https://halshs.archives-ouvertes.fr/halshs-01969152 ; LREC 2018 11th edition of the Language Resources and Evaluation Conference,, May 2018, Miyazaki, Japan (2018)
BASE
Show details
7
Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages
In: Workshop on Computational Research in Phonetics, Phonology, and Morphology ; https://hal.archives-ouvertes.fr/hal-01910757 ; Workshop on Computational Research in Phonetics, Phonology, and Morphology, Oct 2018, Bruxelles, Belgium. pp.32 - 42, ⟨10.18653/v1/P17⟩ (2018)
BASE
Show details
8
Studying variation in Romanian: deletion of the definite article -l in continuous speech
In: Linguistic Vanguard ; https://hal.archives-ouvertes.fr/hal-01837197 ; Linguistic Vanguard, 2018, 5 (1), 17p (2018)
BASE
Show details
9
Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville)
In: 11th edition of the Language Resources and Evaluation Conference (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-01710043 ; 11th edition of the Language Resources and Evaluation Conference (LREC 2018), ELRA, May 2018, Miyazaki, Japan (2018)
BASE
Show details
10
A corpus based study of morpheme deletion in a low resourced language: A case study for Embosi
In: Annual Meeting of the Linguistic Society of America ; https://hal.archives-ouvertes.fr/hal-01837164 ; Annual Meeting of the Linguistic Society of America, Jan 2018, Salt Lake City, United States (2018)
BASE
Show details
11
The French-Algerian Code-Switching Triggered audio corpus (FACST)
In: International Conference on Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-01837163 ; International Conference on Language Resources and Evaluation, ELRA, May 2018, Miyazaki, Japan (2018)
BASE
Show details
12
Studying Vowel Variation in French-Algerian Arabic Code-switched Speech
In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-02387386 ; Annual Conference of the International Speech Communication Association, ISCA, Sep 2018, Hyderabad, India (2018)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
12
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern