DE eng

Search in the Catalogues and Directories

Hits 1 – 2 of 2

1
What does the Canary Say? Low-Dimensional GAN Applied to Birdsong
In: https://hal.inria.fr/hal-03244723 ; 2021 (2021)
Abstract: The generation of speech, and more generally complex animal vocalizations, by artificial systems is a difficult problem which has recently been addressed using various techniques in artificial intelligence. Generative Adversarial Networks (GANs) have shown very good abilities for generating images, and more recently sounds. The usability of a GAN generating a vocal repertoire relies in part on our understanding of the representations of the various sounds in the GAN latent space. Here, we aim to test the ability of WaveGAN to produce a set of canary syllables and constrain the latent space to a small dimension. We trained WaveGANs with varying latent space dimensions (from 1 to 6) on a large dataset of canary syllables (16000 renditions of 16 different syllable types). The sounds produced by the generators are identified and evaluated by a RNN-based classifier trained on the same dataset. This quantitative evaluation is paired with a qualitative evaluation of the GAN output spectrograms across GAN training epochs and latent dimensions, comparing multiple instances of the training for each condition. Altogether, our results show that a latent space of dimension 3 is enough to produce a varied repertoire of sounds of quality often indistinguishable from real canary ones, spanning all the types of syllables of the dataset. Importantly, we show that the 3-dimensional GAN generalizes by interpolating between the various syllable types. We rely on UMAP representations to qualitatively show the similarities between the training data and the generated data, and between the generated syllables and the interpolations produced. Exploring the latent representations of syllable types, we show that they form well identifiable subspaces of the latent space. This study provides tools to train simple sensorimotor models, as inverse models, from perceived sounds to motor representations of the same sounds. Both the RNN-based classifier and the small dimensional GAN provide a way to learn the mappings of perceived and produced sounds.
Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]; [SDV.NEU]Life Sciences [q-bio]/Neurons and Cognition [q-bio.NC]; Birdsong; Generative adversarial network; Latent space; Sound generation
URL: https://hal.inria.fr/hal-03244723/file/Pagliarini2021_canary_GAN__HAL-v1.pdf
https://hal.inria.fr/hal-03244723/document
https://hal.inria.fr/hal-03244723
BASE
Hide details
2
Modeling the neural network responsible for song learning ; Modélisation du réseau neuronal responsable de l'apprentissage du chant chez l'oiseau chanteur
Pagliarini, Silvia. - : HAL CCSD, 2021
In: https://tel.archives-ouvertes.fr/tel-03217834 ; Modeling and Simulation. Université de Bordeaux, 2021. English. ⟨NNT : 2021BORD0107⟩ (2021)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
2
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern