21 |
The Zero Resource Speech Challenge 2021: Spoken language modelling ...
|
|
|
|
BASE
|
|
Show details
|
|
22 |
Textless Speech Emotion Conversion using Discrete and Decomposed Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
23 |
Communicating artificial neural networks develop efficient color-naming systems
|
|
|
|
In: Proc Natl Acad Sci U S A (2021)
|
|
BASE
|
|
Show details
|
|
24 |
Does Infant-Directed Speech Help Phonetic Learning? A Machine Learning Investigation
|
|
|
|
BASE
|
|
Show details
|
|
25 |
Do Infants Really Learn Phonetic Categories?
|
|
|
|
In: Open Mind (Camb) (2021)
|
|
BASE
|
|
Show details
|
|
26 |
Early phonetic learning without phonetic categories: Insights from large-scale simulations on realistic input
|
|
|
|
In: Proc Natl Acad Sci U S A (2021)
|
|
BASE
|
|
Show details
|
|
27 |
Seshat: A tool for managing and verifying annotation campaigns of audio data
|
|
|
|
In: LREC 2020 - 12th Language Resources and Evaluation Conference ; https://hal.archives-ouvertes.fr/hal-02496041 ; LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France. pp.6976-6982 (2020)
|
|
BASE
|
|
Show details
|
|
28 |
An open-source voice type classifier for child-centered daylong recordings
|
|
|
|
In: Interspeech 2020 - Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-02989487 ; Interspeech 2020 - Conference of the International Speech Communication Association, Oct 2020, Shanghai / Virtual, China (2020)
|
|
BASE
|
|
Show details
|
|
29 |
The Zero Resource Speech Benchmark 2021: Metrics and baselines for unsupervised spoken language modeling
|
|
|
|
In: NeuRIPS Workshop on Self-Supervised Learning for Speech and Audio Processing ; https://hal.archives-ouvertes.fr/hal-03070362 ; NeuRIPS Workshop on Self-Supervised Learning for Speech and Audio Processing, Dec 2020, Virtuel, France (2020)
|
|
BASE
|
|
Show details
|
|
30 |
Modelling Perceptual Effects of Phonology with ASR Systems
|
|
|
|
In: CogSci 2020 - 42nd Annual Virtual Meeting of the Cognitive Science Society ; https://hal.archives-ouvertes.fr/hal-03070281 ; CogSci 2020 - 42nd Annual Virtual Meeting of the Cognitive Science Society, Jul 2020, Virtual, France (2020)
|
|
BASE
|
|
Show details
|
|
31 |
Vocal markers from sustained phonation in Huntington's Disease
|
|
|
|
In: INTERSPEECH 2020 - Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-03070388 ; INTERSPEECH 2020 - Annual Conference of the International Speech Communication Association, Oct 2020, Shanghai / Virtual, China (2020)
|
|
BASE
|
|
Show details
|
|
32 |
Towards unsupervised learning of speech features in the wild
|
|
|
|
In: SLT 2020 : IEEE Spoken Language Technology Workshop ; https://hal.archives-ouvertes.fr/hal-03070411 ; SLT 2020 : IEEE Spoken Language Technology Workshop, Dec 2020, Shenzhen / Virtual, China (2020)
|
|
BASE
|
|
Show details
|
|
33 |
The Zero Resource Speech Challenge 2020: Discovering discrete subword and word units
|
|
|
|
In: Interspeech 2020 - Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-02962224 ; Interspeech 2020 - Conference of the International Speech Communication Association, Oct 2020, Shangai / Virtual, China (2020)
|
|
BASE
|
|
Show details
|
|
34 |
Does bilingual input hurt? A simulation of language discrimination and clustering using i-vectors
|
|
|
|
In: CogSci 2020 - 42nd Annual Virtual Meeting of the Cognitive Science Society ; https://hal.archives-ouvertes.fr/hal-02959451 ; CogSci 2020 - 42nd Annual Virtual Meeting of the Cognitive Science Society, Jul 2020, Toronto / Virtual, Canada (2020)
|
|
BASE
|
|
Show details
|
|
35 |
Compositionality and Generalization in Emergent Languages
|
|
|
|
In: ACL 2020 - 8th annual meeting of the Association for Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-02959466 ; ACL 2020 - 8th annual meeting of the Association for Computational Linguistics, Jul 2020, Seattle / Virtual, United States (2020)
|
|
BASE
|
|
Show details
|
|
36 |
"LazImpa": Lazy and Impatient neural agents learn to communicate efficiently
|
|
|
|
In: CONLL 2020 - The SIGNLL Conference on Computational Natural Language Learning ; https://hal.archives-ouvertes.fr/hal-03070404 ; CONLL 2020 - The SIGNLL Conference on Computational Natural Language Learning, Nov 2020, Virtual, France (2020)
|
|
BASE
|
|
Show details
|
|
37 |
Evaluating the reliability of acoustic speech embeddings
|
|
|
|
In: INTERSPEECH 2020 - Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-02977539 ; INTERSPEECH 2020 - Annual Conference of the International Speech Communication Association, Oct 2020, Shanghai / Vitrtual, China (2020)
|
|
Abstract:
International audience ; Speech embeddings are fixed-size acoustic representations of variable-length speech sequences. They are increasingly used for a variety of tasks ranging from information retrieval to un-supervised term discovery and speech segmentation. However, there is currently no clear methodology to compare or optimize the quality of these embeddings in a task-neutral way. Here, we systematically compare two popular metrics, ABX discrimination and Mean Average Precision (MAP), on 5 languages across 17 embedding methods, ranging from supervised to fully unsu-pervised, and using different loss functions (autoencoders, cor-respondance autoencoders, siamese). Then we use the ABX and MAP to predict performances on a new downstream task: the unsupervised estimation of the frequencies of speech segments in a given corpus. We find that overall, ABX and MAP correlate with one another and with frequency estimation. However, substantial discrepancies appear in the fine-grained distinctions across languages and/or embedding methods. This makes it un-realistic at present to propose a task-independent silver bullet method for computing the intrinsic quality of speech embed-dings. There is a need for more detailed analysis of the metrics currently used to evaluate such embeddings.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Evaluation metrics; Frequency estimation; k-nearest neighbours; Representation learning; Speech embeddings; Unsupervised speech processing
|
|
URL: https://hal.inria.fr/hal-02977539 https://hal.inria.fr/hal-02977539/file/Thu-3-2-6.pdf https://hal.inria.fr/hal-02977539/document
|
|
BASE
|
|
Hide details
|
|
38 |
Speech technology for unwritten languages
|
|
|
|
In: ISSN: 2329-9290 ; EISSN: 2329-9304 ; IEEE/ACM Transactions on Audio, Speech and Language Processing ; https://hal.inria.fr/hal-02480675 ; IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2020, ⟨10.1109/TASLP.2020.2973896⟩ (2020)
|
|
BASE
|
|
Show details
|
|
39 |
Speaker detection in the wild: Lessons learned from JSALT 2019
|
|
|
|
In: Odyssey 2020 The Speaker and Language Recognition Workshop ; https://hal.archives-ouvertes.fr/hal-02417632 ; Odyssey 2020 The Speaker and Language Recognition Workshop, Nov 2020, Tokyo, Japan (2020)
|
|
BASE
|
|
Show details
|
|
40 |
LIBRI-LIGHT: a benchmark for asr with limited or no supervision
|
|
|
|
In: ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-02959460 ; ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing, May 2020, Barcelona / Virtual, Spain. pp.7669-7673, ⟨10.1109/ICASSP40776.2020.9052942⟩ (2020)
|
|
BASE
|
|
Show details
|
|
|
|