Page: 1 2 3 4 5 6 7 8 9... 50
81 |
Common Phone: A Multilingual Dataset for Robust Acoustic Modelling ...
|
|
|
|
BASE
|
|
Show details
|
|
82 |
Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end ...
|
|
|
|
BASE
|
|
Show details
|
|
83 |
Low-dimensional representation of infant and adult vocalization acoustics ...
|
|
|
|
BASE
|
|
Show details
|
|
84 |
Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character ...
|
|
|
|
BASE
|
|
Show details
|
|
85 |
Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives ...
|
|
|
|
BASE
|
|
Show details
|
|
86 |
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
87 |
Similarity and Content-based Phonetic Self Attention for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
88 |
BERT-LID: Leveraging BERT to Improve Spoken Language Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
89 |
Chain-based Discriminative Autoencoders for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
90 |
Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis ...
|
|
|
|
BASE
|
|
Show details
|
|
91 |
STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
92 |
Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model ...
|
|
|
|
BASE
|
|
Show details
|
|
95 |
Improving speaker de-identification with functional data analysis of f0 trajectories ...
|
|
|
|
BASE
|
|
Show details
|
|
96 |
Unsupervised word-level prosody tagging for controllable speech synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
97 |
Filter-based Discriminative Autoencoders for Children Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
98 |
Transducer-based language embedding for spoken language identification ...
|
|
|
|
BASE
|
|
Show details
|
|
99 |
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 ...
|
|
|
|
Abstract:
Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on Fluencybank and the German therapy-centric Kassel State of Fluency (KSoF) dataset by training Support Vector Machine classifiers using features extracted from the fine-tuned models for six different stuttering-related events types: blocks, ... : Submitted to Interspeech 2022 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering
|
|
URL: https://dx.doi.org/10.48550/arxiv.2204.03417 https://arxiv.org/abs/2204.03417
|
|
BASE
|
|
Hide details
|
|
100 |
Multi-sequence Intermediate Conditioning for CTC-based ASR ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8 9... 50
|
|