1 |
ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities
|
|
|
|
In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22) ; https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618 ; 2022 (2022)
|
|
Abstract:
International audience ; Whether to retrieve, answer, translate, or reason, multimodality opens up new challenges and perspectives. In this context, we are interested in answering questions about named entities grounded in a visual context using a Knowledge Base (KB). To benchmark this task, called KVQAE (Knowledge-based Visual Question Answering about named Entities), we provide ViQuAE, a dataset of 3.7K questions paired with images. This is the first KVQAE dataset to cover a wide range of entity types (e.g. persons, landmarks, and products). The dataset is annotated using a semi-automatic method. We also propose a KB composed of 1.5M Wikipedia articles paired with images. To set a baseline on the benchmark, we address KVQAE as a two-stage problem: Information Retrieval and Reading Comprehension, with both zero-and few-shot learning methods. The experiments empirically demonstrate the difficulty of the task, especially when questions are not about persons. This work paves the way for better multimodal entity representations and question answering. The dataset, KB, code, and semi-automatic annotation pipeline are freely available at https://github.com/PaulLerner/ViQuAE.
|
|
Keyword:
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM]; dataset; knowledge-based visual question answering; multimodal
|
|
URL: https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618 https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618/document https://doi.org/10.1145/3477495.3531753 https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618/file/lerner_sigir_2022_camera.pdf
|
|
BASE
|
|
Hide details
|
|
2 |
Efficiency of Use of Internet Resources in Teaching a Foreign Language at Non-Linguistic Universities ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Unsupervised quantification of entity consistency between photos and text in real-world news ...
|
|
Müller-Budack, Eric. - : Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, 2022
|
|
BASE
|
|
Show details
|
|
4 |
О РОЛИ ПРЕЗЕНТАЦИИ ПРИ ОБУЧЕНИИ ИНОСТРАННОМУ ЯЗЫКУ В СФЕРЕ ПРОФЕССИОНАЛЬНОЙ КОММУНИКАЦИИ ... : THE ROLE OF PRESENTATION IN TEACHING A FOREIGN LANGUAGE IN THE FIELD OF PROFESSIONAL COMMUNICATION ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Sign Language Recognition System using TensorFlow Object Detection API ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
hate-alert@DravidianLangTech-ACL2022: Ensembling Multi-Modalities for Tamil TrollMeme Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
3MASSIV: Multilingual, Multimodal and Multi-Aspect dataset of Social Media Short Videos ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Uni-EDEN: Universal Encoder-Decoder Network by Multi-Granular Vision-Language Pre-training ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Chain-based Discriminative Autoencoders for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Multimedia Interventions for Neurodiversity: Leveraging Insights from Developmental Cognitive Neuroscience to Build an Innovative Practice
|
|
|
|
In: Brain Sciences; Volume 12; Issue 2; Pages: 147 (2022)
|
|
BASE
|
|
Show details
|
|
11 |
COVID-19 and cyberbullying: deep ensemble model to identify cyberbullying from code-switched languages during the pandemic
|
|
|
|
In: Multimed Tools Appl (2022)
|
|
BASE
|
|
Show details
|
|
12 |
FaceTuneGAN: Face Autoencoder for Convolutional Expression Transfer Using Neural Generative Adversarial Networks
|
|
|
|
In: https://hal.inria.fr/hal-03462778 ; 2021 (2021)
|
|
BASE
|
|
Show details
|
|
13 |
The L2L system for second language learning using visualised zoom calls among students
|
|
|
|
In: Dey-Plissonneau, Aparajita, Lee, Hyowon orcid:0000-0003-4395-7702 , Pradier, Vincent orcid:0000-0002-7050-6408 , Scriney, Michael orcid:0000-0001-6813-2630 and Smeaton, Alan F. orcid:0000-0003-1028-8389 (2021) The L2L system for second language learning using visualised zoom calls among students. In: 16th European Conference on Technology-Enhanced Learning EC-TEL 2021, 20-24 Sept 2021, Bozen-Bolzano, Italy (Online). ISBN 978-3-030-86435-4 (2021)
|
|
BASE
|
|
Show details
|
|
14 |
Utilising visual attention cues for vehicle detection and tracking
|
|
|
|
In: Hu, Feiyan orcid:0000-0001-7451-6438 , Gurram Munirathnam, Venkatesh orcid:0000-0002-4393-9267 , O'Connor, Noel E. orcid:0000-0002-4033-9135 , Smeaton, Alan F. orcid:0000-0003-1028-8389 and Little, Suzanne orcid:0000-0003-3281-3471 (2021) Utilising visual attention cues for vehicle detection and tracking. In: 25th International Conference on Pattern Recognition (ICPR2020), 10-15 Jan 2021, Milan, Italy (Online). (2021)
|
|
BASE
|
|
Show details
|
|
15 |
Attention based video summaries of live online zoom classes
|
|
|
|
In: Lee, Hyowon orcid:0000-0003-4395-7702 , Liu, Mingming orcid:0000-0002-8988-2104 , Riaz, Hamza, Rajasekaran, Navaneethan, Scriney, Michael orcid:0000-0001-6813-2630 and Smeaton, Alan F. orcid:0000-0003-1028-8389 (2021) Attention based video summaries of live online zoom classes. In: AAAI-2021 Workshop on AI Education: "Imagining Post-COVID Education with AI" (TIPCE-2021)., 9 Feb 2021, Online (Vancouver, Canada). (In Press) (2021)
|
|
BASE
|
|
Show details
|
|
16 |
Supporting an effective review of telecollaboration for second language learning by visualising the participation and engagement at Dublin City University
|
|
|
|
In: Lee, Hyowon orcid:0000-0003-4395-7702 , Scriney, Michael orcid:0000-0001-6813-2630 , Dey-Plissonneau, Aparajita and Smeaton, Alan orcid:0000-0003-1028-8389 (2021) Supporting an effective review of telecollaboration for second language learning by visualising the participation and engagement at Dublin City University. In: Virtual Exchange in Higher Education: Charting the Irish Experience, 17 Sept 2021, Online vs MS Teams. (2021)
|
|
BASE
|
|
Show details
|
|
18 |
Leveraging lyrics from audio for MIR ; Exploiter les paroles de chansons à partir de l'audio pour le MIR
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03558515 ; Signal and Image processing. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAT027⟩ (2021)
|
|
BASE
|
|
Show details
|
|
19 |
Overview of LifeCLEF 2021: an evaluation of Machine-Learning based Species Identification and Species Distribution Prediction
|
|
|
|
In: Experimental IR Meets Multilinguality, Multimodality, and Interaction ; https://hal.inria.fr/hal-03415990 ; K. Selçuk Candan; Bogdan Ionescu; Lorraine Goeuriot; Birger Larsen; Henning Müller; Alexis Joly; Maria Maistro; Florina Piroi; Guglielmo Faggioli; Nicola Ferro. Experimental IR Meets Multilinguality, Multimodality, and Interaction, 12880, Springer International Publishing, pp.371-393, 2021, Lecture Notes in Computer Science, ⟨10.1007/978-3-030-85251-1_24⟩ (2021)
|
|
BASE
|
|
Show details
|
|
20 |
Simulating reading mistakes for child speech Transformer-based phone recognition
|
|
|
|
In: Annual Conference of the International Speech Communication Association (INTERSPEECH) ; https://hal.archives-ouvertes.fr/hal-03257870 ; Annual Conference of the International Speech Communication Association (INTERSPEECH), Aug 2021, Brno, Czech Republic (2021)
|
|
BASE
|
|
Show details
|
|
|
|