1 |
Speech technology for unwritten languages
|
|
Scharenborg, Odette; Besacier, Laurent; Black, Alan; Hasegawa-Johnson, Mark; Metze, Florian; Neubig, Graham; Stuker, Sebastian; Godard, Pierre; Müller, Markus; Ondel, Lucas; Palaskar, Shruti; Arthur, Philip; Ciannella, Francesco; Du, Mingxing; Larsen, Elin; Merkx, Danny; Riad, Rachid; Wang, Liming; Dupoux, Emmanuel
|
|
In: ISSN: 2329-9290 ; EISSN: 2329-9304 ; IEEE/ACM Transactions on Audio, Speech and Language Processing ; https://hal.inria.fr/hal-02480675 ; IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2020, ⟨10.1109/TASLP.2020.2973896⟩ (2020)
|
|
Abstract:
International audience ; Speech technology plays an important role in our everyday life. Speech is, among others, used for human-computer interaction, including, for instance, information retrieval and on-line shopping. In the case of an unwritten language, however, speech technology is unfortunately difficult to create, because it cannot be created by the standard combination of pre-trained speech-to-text and text-to-speech subsystems. The research presented in this paper takes the first steps towards speech technology for unwritten languages. Specifically, the aim of this work was 1) to learn speech-to-meaning representations without using text as an intermediate representation, and 2) to test the sufficiency of the learned representations to regenerate speech or translated text, or to retrieve images that depict the meaning of an utterance in an unwritten language. The results suggest that building systems that go directly from speech-to-meaning and from meaning-to-speech, bypassing the need for text, is possible.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
|
|
URL: https://hal.inria.fr/hal-02480675 https://doi.org/10.1109/TASLP.2020.2973896
|
|
BASE
|
|
Hide details
|
|
2 |
Controlling Utterance Length in NMT-based Word Segmentation with Attention
|
|
|
|
In: International Workshop on Spoken Language Translation ; https://hal.archives-ouvertes.fr/hal-02343206 ; International Workshop on Spoken Language Translation, Nov 2019, Hong-Kong, China (2019)
|
|
BASE
|
|
Show details
|
|
3 |
Unsupervised word discovery for computational language documentation ; Découverte non-supervisée de mots pour outiller la linguistique de terrain
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-02286425 ; Artificial Intelligence [cs.AI]. Université Paris Saclay (COmUE), 2019. English. ⟨NNT : 2019SACLS062⟩ (2019)
|
|
BASE
|
|
Show details
|
|
4 |
Unsupervised Word Segmentation from Speech with Attention
|
|
|
|
In: Interspeech 2018 ; https://hal.archives-ouvertes.fr/hal-01818092 ; Interspeech 2018, Sep 2018, Hyderabad, India (2018)
|
|
BASE
|
|
Show details
|
|
5 |
Bayesian models for unit discovery on a very low resource language
|
|
|
|
In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ; https://hal.archives-ouvertes.fr/hal-01709589 ; IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018, Calgary, Alberta, Canada (2018)
|
|
BASE
|
|
Show details
|
|
6 |
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the “Speaking rosetta” JSALT 2017 workshop
|
|
|
|
In: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01709578 ; ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada (2018)
|
|
BASE
|
|
Show details
|
|
7 |
Unsupervised Word Segmentation: does tone matter ?
|
|
|
|
In: International Conference on Intelligent Text Processing and Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-01910756 ; International Conference on Intelligent Text Processing and Computational Linguistics, Mar 2018, Hanoï, Vietnam (2018)
|
|
BASE
|
|
Show details
|
|
8 |
Adaptor Grammars for the Linguist: Word Segmentation Experiments for Very Low-Resource Languages
|
|
|
|
In: Workshop on Computational Research in Phonetics, Phonology, and Morphology ; https://hal.archives-ouvertes.fr/hal-01910757 ; Workshop on Computational Research in Phonetics, Phonology, and Morphology, Oct 2018, Bruxelles, Belgium. pp.32 - 42, ⟨10.18653/v1/P17⟩ (2018)
|
|
BASE
|
|
Show details
|
|
9 |
Parallel Corpora in Mboshi (Bantu C25, Congo-Brazzaville)
|
|
|
|
In: 11th edition of the Language Resources and Evaluation Conference (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-01710043 ; 11th edition of the Language Resources and Evaluation Conference (LREC 2018), ELRA, May 2018, Miyazaki, Japan (2018)
|
|
BASE
|
|
Show details
|
|
10 |
BULB: Breaking the Unwritten Language Barrier
|
|
|
|
In: Procedia Computer Science ; Computational Methods for Endangered Language Documentation and Description ; https://hal.archives-ouvertes.fr/hal-01836496 ; Computational Methods for Endangered Language Documentation and Description, May 2016, Yogyakarta, Indonesia. pp.8-14, ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
11 |
Breaking the unwritten language barrier: the BULB project
|
|
|
|
In: SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages ; https://halshs.archives-ouvertes.fr/halshs-01428027 ; SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, May 2016, Yogyakarta, Indonesia. ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
12 |
Preliminary Experiments on Unsupervised Word Discovery in Mboshi
|
|
|
|
In: Interspeech 2016 proceedings ; Interspeech 2016 ; https://hal.archives-ouvertes.fr/hal-01350119 ; Interspeech 2016, Sep 2016, San-Francisco, United States (2016)
|
|
BASE
|
|
Show details
|
|
13 |
Innovative technologies for under-resourced language documentation: The BULB Project
|
|
|
|
In: CCURL proceedings ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC ; https://hal.archives-ouvertes.fr/hal-01350124 ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC, May 2016, Portoroz, Slovenia (2016)
|
|
BASE
|
|
Show details
|
|
14 |
Breaking the unwritten language barrier: the BULB project
|
|
|
|
In: SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages ; https://halshs.archives-ouvertes.fr/halshs-01428027 ; SLTU-2016 5th Workshop on Spoken Language Technologies for Under-resourced languages, May 2016, Yogyakarta, Indonesia. ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
15 |
Innovative technologies for under-resourced language documentation: The BULB Project
|
|
|
|
In: CCURL proceedings ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC ; https://hal.archives-ouvertes.fr/hal-01350124 ; Workshop CCURL 2016 - Collaboration and Computing for Under-Resourced Languages - LREC, May 2016, Portoroz, Slovenia (2016)
|
|
BASE
|
|
Show details
|
|
16 |
BULB: Breaking the Unwritten Language Barrier
|
|
|
|
In: Procedia Computer Science ; Computational Methods for Endangered Language Documentation and Description ; https://hal.archives-ouvertes.fr/hal-01836496 ; Computational Methods for Endangered Language Documentation and Description, May 2016, Yogyakarta, Indonesia. pp.8-14, ⟨10.1016/j.procs.2016.04.023⟩ (2016)
|
|
BASE
|
|
Show details
|
|
|
|