1 |
ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities
|
|
|
|
In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22) ; https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618 ; 2022 (2022)
|
|
Abstract:
International audience ; Whether to retrieve, answer, translate, or reason, multimodality opens up new challenges and perspectives. In this context, we are interested in answering questions about named entities grounded in a visual context using a Knowledge Base (KB). To benchmark this task, called KVQAE (Knowledge-based Visual Question Answering about named Entities), we provide ViQuAE, a dataset of 3.7K questions paired with images. This is the first KVQAE dataset to cover a wide range of entity types (e.g. persons, landmarks, and products). The dataset is annotated using a semi-automatic method. We also propose a KB composed of 1.5M Wikipedia articles paired with images. To set a baseline on the benchmark, we address KVQAE as a two-stage problem: Information Retrieval and Reading Comprehension, with both zero-and few-shot learning methods. The experiments empirically demonstrate the difficulty of the task, especially when questions are not about persons. This work paves the way for better multimodal entity representations and question answering. The dataset, KB, code, and semi-automatic annotation pipeline are freely available at https://github.com/PaulLerner/ViQuAE.
|
|
Keyword:
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM]; dataset; knowledge-based visual question answering; multimodal
|
|
URL: https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618 https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618/document https://doi.org/10.1145/3477495.3531753 https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618/file/lerner_sigir_2022_camera.pdf
|
|
BASE
|
|
Hide details
|
|
2 |
Unsupervised quantification of entity consistency between photos and text in real-world news ...
|
|
Müller-Budack, Eric. - : Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, 2022
|
|
BASE
|
|
Show details
|
|
3 |
Supporting an effective review of telecollaboration for second language learning by visualising the participation and engagement at Dublin City University
|
|
|
|
In: Lee, Hyowon orcid:0000-0003-4395-7702 , Scriney, Michael orcid:0000-0001-6813-2630 , Dey-Plissonneau, Aparajita and Smeaton, Alan orcid:0000-0003-1028-8389 (2021) Supporting an effective review of telecollaboration for second language learning by visualising the participation and engagement at Dublin City University. In: Virtual Exchange in Higher Education: Charting the Irish Experience, 17 Sept 2021, Online vs MS Teams. (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Sign and Search: Sign Search Functionality for Sign Language Lexica ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Recommending Themes for Ad Creative Design via Visual-Linguistic Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Fuzzy Logic Based Integration of Web Contextual Linguistic Structures for Enriching Conceptual Visual Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
MusicTM-Dataset for Joint Representation Learning among Sheet Music, Lyrics, and Musical Audio ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Utilization of multimodal interaction signals for automatic summarisation of academic presentations
|
|
Curtis, Keith. - : Dublin City University. School of Computing, 2018
|
|
In: Curtis, Keith (2018) Utilization of multimodal interaction signals for automatic summarisation of academic presentations. PhD thesis, Dublin City University. (2018)
|
|
BASE
|
|
Show details
|
|
10 |
Multimodal Machine Translation with Reinforcement Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
ImproteK: introducing scenarios into human-computer music improvisation
|
|
|
|
In: ACM Computers in Entertainment ; https://hal.archives-ouvertes.fr/hal-01380163 ; ACM Computers in Entertainment, 2017, ⟨10.1145/3022635⟩ (2017)
|
|
BASE
|
|
Show details
|
|
12 |
Multimodal Person Discovery in Broadcast TV: lessons learned from MediaEval 2015
|
|
|
|
In: ISSN: 1380-7501 ; EISSN: 1573-7721 ; Multimedia Tools and Applications ; https://hal.archives-ouvertes.fr/hal-01690581 ; Multimedia Tools and Applications, Springer Verlag, 2017, 76 (21), pp.22547 - 22567. ⟨10.1007/s11042-017-4730-x⟩ (2017)
|
|
BASE
|
|
Show details
|
|
13 |
Enabling Embodied Analogies in Intelligent Music Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Narrative Smoothing: Dynamic Conversational Network for the Analysis of TV Series Plots
|
|
|
|
In: DyNo: 2nd International Workshop on Dynamics in Networks, in conjunction with the 2016 IEEE/ACM International Conference ASONAM ; https://hal.archives-ouvertes.fr/hal-01276708 ; DyNo: 2nd International Workshop on Dynamics in Networks, in conjunction with the 2016 IEEE/ACM International Conference ASONAM, Aug 2016, San Francisco, United States. pp.1111-1118, ⟨10.1109/ASONAM.2016.7752379⟩ (2016)
|
|
BASE
|
|
Show details
|
|
16 |
Hierarchical topic structuring: from dense segmentation to topically focused fragments via burst analysis
|
|
|
|
In: Recent Advances on Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-01186443 ; Recent Advances on Natural Language Processing, 2015, Hissar, Bulgaria (2015)
|
|
BASE
|
|
Show details
|
|
17 |
Temporal re-scoring vs. temporal descriptors for semantic indexing of videos
|
|
|
|
In: 13th International Workshop on Content-Based Multimedia Indexing (CBMI) ; https://hal.archives-ouvertes.fr/hal-01230719 ; 13th International Workshop on Content-Based Multimedia Indexing (CBMI), Jun 2015, Prague, Czech Republic. pp.1-4, ⟨10.1109/CBMI.2015.7153626⟩ (2015)
|
|
BASE
|
|
Show details
|
|
18 |
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Novel perspectives and approaches to video summarization
|
|
Guan, Genliang. - : The University of Sydney, 2015. : Faculty of Engineering and Information Technologies, School of Information Technologies, 2015
|
|
BASE
|
|
Show details
|
|
20 |
Planning Human-Computer Improvisation
|
|
|
|
In: International Computer Music Conference ; https://hal.archives-ouvertes.fr/hal-01053834 ; International Computer Music Conference, Sep 2014, Athens, Greece ; http://icmc14-smc14.net (2014)
|
|
BASE
|
|
Show details
|
|
|
|