Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	Automated audio captioning by fine-tuning bart with audioset tags
	Gontier, Félix; Serizel, Romain; Cerisara, Christophe
	In: DCASE 2021 - 6th Workshop on Detection and Classification of Acoustic Scenes and Events ; https://hal.inria.fr/hal-03522488 ; DCASE 2021 - 6th Workshop on Detection and Classification of Acoustic Scenes and Events, Nov 2021, Virtual, Spain (2021)
	Abstract: International audience ; Automated audio captioning is the multimodal task of describing environmental audio recordings with fluent natural language. Most current methods utilize pre-trained analysis models to extract relevant semantic content from the audio input. However, prior information on language modeling is rarely introduced, and corresponding architectures are limited in capacity due to data scarcity. In this paper, we present a method leveraging the linguistic information contained in BART, a large-scale conditional language model with general purpose pre-training. The caption generation is conditioned on sequences of textual AudioSet tags. This input is enriched with temporally aligned audio embeddings that allows the model to improve the sound event recognition. The full BART architecture is fine-tuned with few additional parameters. Experimental results demonstrate that, beyond the scaling properties of the architecture, language-only pre-training improves the text quality in the multimodal setting of audio captioning. The best model achieves stateof-the-art performance on AudioCaps with 46.5 SPIDEr.
	Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]; [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Audio captioning; audio tagging; BART; language models; transfer learning
	URL: https://hal.inria.fr/hal-03522488 https://hal.inria.fr/hal-03522488/file/DCASE2021Workshop_Gontier_57.pdf https://hal.inria.fr/hal-03522488/document
	BASE
	Hide details

Search in the Catalogues and Directories