Home
Catalogue search
Refine your search:
Keyword:
Audio captioning (1)
BART (1)
[INFO.INFO-AI]Computer Science [cs] / Artificial Intelligence [cs.AI] (1)
[INFO.INFO-SD]Computer Science [cs] / Sound [cs.SD] (1)
[INFO.INFO-TS]Computer Science [cs] / Signal and Image Processing (1)
[INFO.INFO-TT]Computer Science [cs] / Document and Text Processing (1)
audio tagging (1)
language models (1)
transfer learning (1)
Creator / Publisher
Year
Medium
Type
BLLDB-Access
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 1 of 1
1
Automated audio captioning by fine-tuning bart with audioset tags
Gontier, Félix
;
Serizel, Romain
;
Cerisara, Christophe
In: DCASE 2021 - 6th Workshop on Detection and Classification of Acoustic Scenes and Events ; https://hal.inria.fr/hal-03522488 ; DCASE 2021 - 6th Workshop on Detection and Classification of Acoustic Scenes and Events, Nov 2021, Virtual, Spain (2021)
Abstract:
International audience ; Automated audio captioning is the multimodal task of describing environmental audio recordings with fluent natural language. Most current methods utilize pre-trained analysis models to extract relevant semantic content from the audio input. However, prior information on language modeling is rarely introduced, and corresponding architectures are limited in capacity due to data scarcity. In this paper, we present a method leveraging the linguistic information contained in BART, a large-scale conditional language model with general purpose pre-training. The caption generation is conditioned on sequences of textual AudioSet tags. This input is enriched with temporally aligned audio embeddings that allows the model to improve the sound event recognition. The full BART architecture is fine-tuned with few additional parameters. Experimental results demonstrate that, beyond the scaling properties of the architecture, language-only pre-training improves the text quality in the multimodal setting of audio captioning. The best model achieves stateof-the-art performance on AudioCaps with 46.5 SPIDEr.
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
;
[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]
;
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
;
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
;
Audio captioning
;
audio tagging
;
BART
;
language models
;
transfer learning
URL:
https://hal.inria.fr/hal-03522488
https://hal.inria.fr/hal-03522488/file/DCASE2021Workshop_Gontier_57.pdf
https://hal.inria.fr/hal-03522488/document
BASE
Hide details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
1
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern