1 |
ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities
|
|
|
|
In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22) ; https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning
|
|
|
|
In: https://hal.inria.fr/hal-03536340 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Evaluating multimodal literacy: Academic and professional interactions around student-produced instructional video tutorials
|
|
|
|
In: ISSN: 0346-251X ; System ; https://hal.archives-ouvertes.fr/hal-03521668 ; System, Elsevier, 2022, 105, pp.102727. ⟨10.1016/j.system.2022.102727⟩ (2022)
|
|
BASE
|
|
Show details
|
|
4 |
Dance, multilingual repertoires and the Italian landscape: asylum seekers’ narratives in an arts-based project
|
|
|
|
BASE
|
|
Show details
|
|
5 |
An inquiry into the development of critical text creators: Teaching grammar in the primary years ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
The Korsakow platform and nonlinear narratives as a means to enhance foreign language learning in HE
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Mutual Understanding in Situated Interactions with Conversational User Interfaces : Theory, Studies, and Computation
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Multimodal Lip-Reading for Tracheostomy Patients in the Greek Language
|
|
|
|
In: Computers; Volume 11; Issue 3; Pages: 34 (2022)
|
|
BASE
|
|
Show details
|
|
9 |
On Semiotics Perspectives of Computational Thinking: Unravelling the “Pamphlet” Approach, a Case Study
|
|
|
|
In: Sustainability; Volume 14; Issue 4; Pages: 1956 (2022)
|
|
BASE
|
|
Show details
|
|
10 |
Segmentation of Glottal Images from High-Speed Videoendoscopy Optimized by Synchronous Acoustic Recordings
|
|
|
|
In: Sensors; Volume 22; Issue 5; Pages: 1751 (2022)
|
|
BASE
|
|
Show details
|
|
11 |
Recognition of the Mental Workloads of Pilots in the Cockpit Using EEG Signals
|
|
|
|
In: Applied Sciences; Volume 12; Issue 5; Pages: 2298 (2022)
|
|
BASE
|
|
Show details
|
|
12 |
Beyond the Edge: Markerless Pose Estimation of Speech Articulators from Ultrasound and Camera Images Using DeepLabCut
|
|
|
|
In: Sensors; Volume 22; Issue 3; Pages: 1133 (2022)
|
|
BASE
|
|
Show details
|
|
13 |
Emotion Classification from Speech and Text in Videos Using a Multimodal Approach
|
|
|
|
In: Multimodal Technologies and Interaction; Volume 6; Issue 4; Pages: 28 (2022)
|
|
BASE
|
|
Show details
|
|
14 |
Primary Pupils’ Multimodal Representations in Worksheets—Text Work in Science Education
|
|
|
|
In: Education Sciences; Volume 12; Issue 3; Pages: 221 (2022)
|
|
BASE
|
|
Show details
|
|
15 |
Realistic Image Generation from Text by Using BERT-Based Embedding
|
|
|
|
In: Electronics; Volume 11; Issue 5; Pages: 764 (2022)
|
|
BASE
|
|
Show details
|
|
16 |
Framing right-wing populist satire: the case-study of Ghisberto's cartoons in Italy
|
|
|
|
In: Punctum: International Journal of Semiotics ; 6 ; 2 ; 29-55 ; Semiotics of Political Communication (2022)
|
|
BASE
|
|
Show details
|
|
18 |
Made in China versus Made in Spain. A corpus-driven study comparing AD in Chinese and Spanish
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Towards the new construct of academic English in the digital age
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Beyond the edge: Markerless pose estimation of speech articulators from ultrasound and camera images using DeepLabCut
|
|
|
|
Abstract:
Automatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56, mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips. ; https://doi.org/10.3390/s22031133 ; 22 ; pub ; pub ; 3
|
|
Keyword:
Keypoints; Landmarks; Lip Reading; Multimodal Speech; Pose Estimation; Speech Kinematics; Ultrasound Tongue Imaging
|
|
URL: https://hdl.handle.net/20.500.12289/11795 https://doi.org/10.3390/s22031133 https://eresearch.qmu.ac.uk/handle/20.500.12289/11795
|
|
BASE
|
|
Hide details
|
|
|
|