1 |
ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities
|
|
|
|
In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22) ; https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Unsupervised quantification of entity consistency between photos and text in real-world news ...
|
|
Müller-Budack, Eric. - : Hannover : Institutionelles Repositorium der Leibniz Universität Hannover, 2022
|
|
BASE
|
|
Show details
|
|
3 |
Supporting an effective review of telecollaboration for second language learning by visualising the participation and engagement at Dublin City University
|
|
|
|
In: Lee, Hyowon orcid:0000-0003-4395-7702 , Scriney, Michael orcid:0000-0001-6813-2630 , Dey-Plissonneau, Aparajita and Smeaton, Alan orcid:0000-0003-1028-8389 (2021) Supporting an effective review of telecollaboration for second language learning by visualising the participation and engagement at Dublin City University. In: Virtual Exchange in Higher Education: Charting the Irish Experience, 17 Sept 2021, Online vs MS Teams. (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Sign and Search: Sign Search Functionality for Sign Language Lexica ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Recommending Themes for Ad Creative Design via Visual-Linguistic Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Fuzzy Logic Based Integration of Web Contextual Linguistic Structures for Enriching Conceptual Visual Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
MusicTM-Dataset for Joint Representation Learning among Sheet Music, Lyrics, and Musical Audio ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Utilization of multimodal interaction signals for automatic summarisation of academic presentations
|
|
Curtis, Keith. - : Dublin City University. School of Computing, 2018
|
|
In: Curtis, Keith (2018) Utilization of multimodal interaction signals for automatic summarisation of academic presentations. PhD thesis, Dublin City University. (2018)
|
|
Abstract:
Multimedia archives are expanding rapidly. For these, there exists a shortage of retrieval and summarisation techniques for accessing and browsing content where the main information exists in the audio stream. This thesis describes an investigation into the development of novel feature extraction and summarisation techniques for audio-visual recordings of academic presentations. We report on the development of a multimodal dataset of academic presentations. This dataset is labelled by human annotators to the concepts of presentation ratings, audience engagement levels, speaker emphasis, and audience comprehension. We investigate the automatic classification of speaker ratings and audience engagement by extracting audio-visual features from video of the presenter and audience and training classifiers to predict speaker ratings and engagement levels. Following this, we investigate automatic identi�cation of areas of emphasised speech. By analysing all human annotated areas of emphasised speech, minimum speech pitch and gesticulation are identified as indicating emphasised speech when occurring together. Investigations are conducted into the speaker's potential to be comprehended by the audience. Following crowdsourced annotation of comprehension levels during academic presentations, a set of audio-visual features considered most likely to affect comprehension levels are extracted. Classifiers are trained on these features and comprehension levels could be predicted over a 7-class scale to an accuracy of 49%, and over a binary distribution to an accuracy of 85%. Presentation summaries are built by segmenting speech transcripts into phrases, and using keywords extracted from the transcripts in conjunction with extracted paralinguistic features. Highest ranking segments are then extracted to build presentation summaries. Summaries are evaluated by performing eye-tracking experiments as participants watch presentation videos. Participants were found to be consistently more engaged for presentation summaries than for full presentations. Summaries were also found to contain a higher concentration of new information than full presentations.
|
|
Keyword:
Digital video; Evaluation; Eye Tracking; Feature Classification; Image processing; Information retrieval; Interactive computer systems; Multimedia systems; Video Summarisation
|
|
URL: http://doras.dcu.ie/22411/
|
|
BASE
|
|
Hide details
|
|
10 |
Multimodal Machine Translation with Reinforcement Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
ImproteK: introducing scenarios into human-computer music improvisation
|
|
|
|
In: ACM Computers in Entertainment ; https://hal.archives-ouvertes.fr/hal-01380163 ; ACM Computers in Entertainment, 2017, ⟨10.1145/3022635⟩ (2017)
|
|
BASE
|
|
Show details
|
|
12 |
Multimodal Person Discovery in Broadcast TV: lessons learned from MediaEval 2015
|
|
|
|
In: ISSN: 1380-7501 ; EISSN: 1573-7721 ; Multimedia Tools and Applications ; https://hal.archives-ouvertes.fr/hal-01690581 ; Multimedia Tools and Applications, Springer Verlag, 2017, 76 (21), pp.22547 - 22567. ⟨10.1007/s11042-017-4730-x⟩ (2017)
|
|
BASE
|
|
Show details
|
|
13 |
Enabling Embodied Analogies in Intelligent Music Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Narrative Smoothing: Dynamic Conversational Network for the Analysis of TV Series Plots
|
|
|
|
In: DyNo: 2nd International Workshop on Dynamics in Networks, in conjunction with the 2016 IEEE/ACM International Conference ASONAM ; https://hal.archives-ouvertes.fr/hal-01276708 ; DyNo: 2nd International Workshop on Dynamics in Networks, in conjunction with the 2016 IEEE/ACM International Conference ASONAM, Aug 2016, San Francisco, United States. pp.1111-1118, ⟨10.1109/ASONAM.2016.7752379⟩ (2016)
|
|
BASE
|
|
Show details
|
|
16 |
Hierarchical topic structuring: from dense segmentation to topically focused fragments via burst analysis
|
|
|
|
In: Recent Advances on Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-01186443 ; Recent Advances on Natural Language Processing, 2015, Hissar, Bulgaria (2015)
|
|
BASE
|
|
Show details
|
|
17 |
Temporal re-scoring vs. temporal descriptors for semantic indexing of videos
|
|
|
|
In: 13th International Workshop on Content-Based Multimedia Indexing (CBMI) ; https://hal.archives-ouvertes.fr/hal-01230719 ; 13th International Workshop on Content-Based Multimedia Indexing (CBMI), Jun 2015, Prague, Czech Republic. pp.1-4, ⟨10.1109/CBMI.2015.7153626⟩ (2015)
|
|
BASE
|
|
Show details
|
|
18 |
Visual Affect Around the World: A Large-scale Multilingual Visual Sentiment Ontology ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Novel perspectives and approaches to video summarization
|
|
Guan, Genliang. - : The University of Sydney, 2015. : Faculty of Engineering and Information Technologies, School of Information Technologies, 2015
|
|
BASE
|
|
Show details
|
|
20 |
Planning Human-Computer Improvisation
|
|
|
|
In: International Computer Music Conference ; https://hal.archives-ouvertes.fr/hal-01053834 ; International Computer Music Conference, Sep 2014, Athens, Greece ; http://icmc14-smc14.net (2014)
|
|
BASE
|
|
Show details
|
|
|
|