1 |
The Zero Resource Speech Challenge 2021: Spoken language modelling
|
|
|
|
In: ISSN: 0162-8828 ; IEEE Transactions on Pattern Analysis and Machine Intelligence ; https://hal.inria.fr/hal-03329301 ; IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2021, pp.1-1. ⟨10.1109/TPAMI.2021.3083839⟩ (2021)
|
|
BASE
|
|
Show details
|
|
2 |
The Zero Resource Speech Challenge 2021: Spoken language modelling
|
|
|
|
In: Interspeech 2021 - Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-03329301 ; Interspeech 2021 - Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic. ⟨10.1109/TPAMI.2021.3083839⟩ (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
|
|
|
|
In: INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-03329245 ; INTERSPEECH 2021 - Annual Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic (2021)
|
|
Abstract:
In Proceedings of Interspeech 2021 ; International audience ; We propose using self-supervised discrete representations for the task of speech resynthesis. To generate disentangled representation, we separately extract low-bitrate representations for speech content, prosodic information, and speaker identity. This allows to synthesize speech in a controllable manner. We analyze various state-of-the-art, self-supervised representation learning methods and shed light on the advantages of each method while considering reconstruction quality and disentanglement properties. Specifically, we evaluate the F0 reconstruction, speaker identification performance (for both resynthesis and voice conversion), recordings' intelligibility, and overall quality using subjective human evaluation. Lastly, we demonstrate how these representations can be used for an ultra-lightweight speech codec. Using the obtained representations, we can get to a rate of 365 bits per second while providing better speech quality than the baseline methods. Audio samples can be found under the following link: speechbot.github.io/resynthesis.
|
|
Keyword:
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; Self-supervised learning; Speech codec; Speech generation; Speech resynthesis
|
|
URL: https://hal.inria.fr/hal-03329245/document https://hal.inria.fr/hal-03329245 https://hal.inria.fr/hal-03329245/file/2104.00355.pdf
|
|
BASE
|
|
Hide details
|
|
4 |
Communicating artificial neural networks develop efficient color-naming systems
|
|
|
|
In: ISSN: 0027-8424 ; EISSN: 1091-6490 ; Proceedings of the National Academy of Sciences of the United States of America ; https://hal.inria.fr/hal-03329084 ; Proceedings of the National Academy of Sciences of the United States of America , National Academy of Sciences, 2021, 118 (12), ⟨10.1073/pnas.2016569118⟩ (2021)
|
|
BASE
|
|
Show details
|
|
7 |
The Zero Resource Speech Challenge 2021: Spoken language modelling ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Textless Speech Emotion Conversion using Discrete and Decomposed Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Communicating artificial neural networks develop efficient color-naming systems
|
|
|
|
In: Proc Natl Acad Sci U S A (2021)
|
|
BASE
|
|
Show details
|
|
10 |
Compositionality and Generalization in Emergent Languages
|
|
|
|
In: ACL 2020 - 8th annual meeting of the Association for Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-02959466 ; ACL 2020 - 8th annual meeting of the Association for Computational Linguistics, Jul 2020, Seattle / Virtual, United States (2020)
|
|
BASE
|
|
Show details
|
|
11 |
LIBRI-LIGHT: a benchmark for asr with limited or no supervision
|
|
|
|
In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-02959460 ; ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2020, Barcelona / Virtual, Spain. pp.7669-7673, ⟨10.1109/ICASSP40776.2020.9052942⟩ (2020)
|
|
BASE
|
|
Show details
|
|
12 |
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
|
|
|
|
In: SLT 2020 - IEEE Spoken Language Technology Workshop ; https://hal.archives-ouvertes.fr/hal-03070321 ; SLT 2020 - IEEE Spoken Language Technology Workshop, Dec 2020, Shenzhen / Virtual, China (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Anti-efficient encoding in emergent communication
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-02274205 ; 2019 (2019)
|
|
BASE
|
|
Show details
|
|
14 |
Word-order biases in deep-agent emergent communication
|
|
|
|
In: ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-02274157 ; ACL 2019 - 57th Annual Meeting of the Association for Computational Linguistics, Jul 2019, Florence, Italy (2019)
|
|
BASE
|
|
Show details
|
|
15 |
EGG: a toolkit for research on Emergence of lanGuage in Games
|
|
|
|
In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations ; https://hal.archives-ouvertes.fr/hal-02274229 ; Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP): System Demonstrations, Nov 2019, Hong Kong, China. ⟨10.18653/v1/D19-3010⟩ (2019)
|
|
BASE
|
|
Show details
|
|
|
|