1 |
Beyond the edge: Markerless pose estimation of speech articulators from ultrasound and camera images using DeepLabCut
|
|
|
|
Abstract:
Automatic feature extraction from images of speech articulators is currently achieved by detecting edges. Here, we investigate the use of pose estimation deep neural nets with transfer learning to perform markerless estimation of speech articulator keypoints using only a few hundred hand-labelled images as training input. Midsagittal ultrasound images of the tongue, jaw, and hyoid and camera images of the lips were hand-labelled with keypoints, trained using DeepLabCut and evaluated on unseen speakers and systems. Tongue surface contours interpolated from estimated and hand-labelled keypoints produced an average mean sum of distances (MSD) of 0.93, s.d. 0.46 mm, compared with 0.96, s.d. 0.39 mm, for two human labellers, and 2.3, s.d. 1.5 mm, for the best performing edge detection algorithm. A pilot set of simultaneous electromagnetic articulography (EMA) and ultrasound recordings demonstrated partial correlation among three physical sensor positions and the corresponding estimated keypoints and requires further investigation. The accuracy of the estimating lip aperture from a camera video was high, with a mean MSD of 0.70, s.d. 0.56, mm compared with 0.57, s.d. 0.48 mm for two human labellers. DeepLabCut was found to be a fast, accurate and fully automatic method of providing unique kinematic data for tongue, hyoid, jaw, and lips. ; https://doi.org/10.3390/s22031133 ; 22 ; pub ; pub ; 3
|
|
Keyword:
Keypoints; Landmarks; Lip Reading; Multimodal Speech; Pose Estimation; Speech Kinematics; Ultrasound Tongue Imaging
|
|
URL: https://hdl.handle.net/20.500.12289/11795 https://doi.org/10.3390/s22031133 https://eresearch.qmu.ac.uk/handle/20.500.12289/11795
|
|
BASE
|
|
Hide details
|
|
2 |
The impact of real-time articulatory information on phonetic transcription : ultrasound-aided transcription in cleft lip and palate speech
|
|
|
|
BASE
|
|
Show details
|
|
3 |
The impact of real-time articulatory information on phonetic transcription: Ultrasound-aided transcription in cleft lip and palate speech
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Enabling new articulatory gestures in children with persistent speech sound disorders using ultrasound visual biofeedback
|
|
|
|
BASE
|
|
Show details
|
|
5 |
UltraSuite: A Repository of Ultrasound and Acoustic Data from Child Speech Therapy Sessions
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Covert contrast and covert errors in persistent velar fronting
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Covert contrast and covert error in persistent velar fronting
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Using ultrasound visual biofeedback to treat persistent primary speech sound disorders
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Helping children learn non-native articulations: The implications for ultrasound-based clinical intervention
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Towards a 3D Tongue model for parameterising ultrasound data
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Using ultrasound visual biofeedback to treat persistent primary speech sound disorders
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Recording speech articulation in dialogue: Evaluating a synchronized double Electromagnetic Articulography setup
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Comparing articulatory images: An MRI / Ultrasound Tongue Image database
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Head-Probe stabilisation in ultrasound tongue imaging using a headset to permit natural head movement.
|
|
|
|
BASE
|
|
Show details
|
|
17 |
High-speed Cineloop Ultrasound vs. Video Ultrasound Tongue Imaging: Comparison of Front and Back Lingual Gesture Location and Relative Timing.
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Protocol for Restricting Head Movement when Recording Ultrasound Images of Speech
|
|
|
|
BASE
|
|
Show details
|
|
|
|