1 |
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Learning Hierarchical Discrete Linguistic Units from Visually-Grounded Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Transfer Learning from Audio-Visual Grounding to Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Unsupervised learning of spoken language with visual context
|
|
|
|
In: Neural Information Processing Systems (NIPS) (2019)
|
|
BASE
|
|
Show details
|
|
9 |
Vision as an Interlingua: Learning Multilingual Semantic Embeddings of Untranscribed Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Learning Word-Like Units from Joint Audio-Visual Analysis ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Unsupervised modeling of latent topics and lexical units in speech audio
|
|
|
|
BASE
|
|
Show details
|
|
13 |
A Summary of the 2012 JHU CLSP workshop on zero resource speech technologies and models of early language acquisition
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Phonetic Landmark Detection for Automatic Language Identification
|
|
|
|
BASE
|
|
Show details
|
|
|
|