Page: 1 2 3 4 5 6 7 8 9 10... 158
101 |
Task and language in Spanish–English narratives (Wofford et al., 2022) ...
|
|
|
|
BASE
|
|
Show details
|
|
102 |
Task and language in Spanish–English narratives (Wofford et al., 2022) ...
|
|
|
|
BASE
|
|
Show details
|
|
103 |
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 ...
|
|
|
|
Abstract:
Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on Fluencybank and the German therapy-centric Kassel State of Fluency (KSoF) dataset by training Support Vector Machine classifiers using features extracted from the fine-tuned models for six different stuttering-related events types: blocks, ... : Submitted to Interspeech 2022 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering
|
|
URL: https://dx.doi.org/10.48550/arxiv.2204.03417 https://arxiv.org/abs/2204.03417
|
|
BASE
|
|
Hide details
|
|
104 |
Multi-sequence Intermediate Conditioning for CTC-based ASR ...
|
|
|
|
BASE
|
|
Show details
|
|
105 |
Code Switched and Code Mixed Speech Recognition for Indic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
107 |
Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
|
|
|
|
BASE
|
|
Show details
|
|
108 |
Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
109 |
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
|
|
|
|
BASE
|
|
Show details
|
|
110 |
CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations ...
|
|
|
|
BASE
|
|
Show details
|
|
111 |
Enhance Language Identification using Dual-mode Model with Knowledge Distillation ...
|
|
|
|
BASE
|
|
Show details
|
|
112 |
MAESTRO: Matched Speech Text Representations through Modality Matching ...
|
|
|
|
BASE
|
|
Show details
|
|
115 |
Improving Non-native Word-level Pronunciation Scoring with Phone-level Mixup Data Augmentation and Multi-source Information ...
|
|
|
|
BASE
|
|
Show details
|
|
116 |
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
118 |
Sample, Translate, Recombine: Leveraging Audio Alignments for Data Augmentation in End-to-end Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
119 |
Wavebender GAN: An architecture for phonetically meaningful speech manipulation ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8 9 10... 158
|
|