1 |
On Generative Spoken Language Modeling from Raw Audio
|
|
Lakhotia, Kushal; Kharitonov, Evgeny; Hsu, Wei-Ning; Adi, Yossi; Polyak, Adam; Bolte, Benjamin; Nguyen, Tu-Anh; Copet, Jade; Baevski, Alexei; Mohamed, Adelrahman; Dupoux, Emmanuel
|
|
In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03329219 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2021 (2021)
|
|
Abstract:
International audience ; We introduce Generative Spoken Language Modeling, the task of learning the acoustic and linguistic characteristics of a language from raw audio (no text, no labels), and a set of metrics to automatically evaluate the learned representations at acoustic and linguistic levels for both encoding and generation. We set up baseline systems consisting of a discrete speech encoder (returning pseudo-text units), a generative language model (trained on pseudo-text), and a speech decoder (generating a waveform from pseudo-text) all trained without supervision and validate the proposed metrics with human evaluation. Across 3 speech encoders (CPC, wav2vec 2.0, HuBERT), we find that the number of discrete units (50, 100, or 200) matters in a task-dependent and encoder-dependent way, and that some combinations approach text-based systems.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
|
|
URL: https://hal.inria.fr/hal-03329219/file/2102.01192.pdf https://hal.inria.fr/hal-03329219 https://hal.inria.fr/hal-03329219/document
|
|
BASE
|
|
Hide details
|
|
4 |
The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
|
|
|
|
In: NeuRIPS Workshop on Self-Supervised Learning for Speech and Audio Processing ; https://hal.archives-ouvertes.fr/hal-03070362 ; NeuRIPS Workshop on Self-Supervised Learning for Speech and Audio Processing, Dec 2020, Virtuel, France (2020)
|
|
BASE
|
|
Show details
|
|
5 |
The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|