Page: 1 2 3 4 5 6 7 8... 100
61 |
Filter-based Discriminative Autoencoders for Children Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
62 |
Transducer-based language embedding for spoken language identification ...
|
|
|
|
BASE
|
|
Show details
|
|
63 |
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 ...
|
|
|
|
Abstract:
Stuttering is a varied speech disorder that harms an individual's communication ability. Persons who stutter (PWS) often use speech therapy to cope with their condition. Improving speech recognition systems for people with such non-typical speech or tracking the effectiveness of speech therapy would require systems that can detect dysfluencies while at the same time being able to detect speech techniques acquired in therapy. This paper shows that fine-tuning wav2vec 2.0 for the classification of stuttering on a sizeable English corpus containing stuttered speech, in conjunction with multi-task learning, boosts the effectiveness of the general-purpose wav2vec 2.0 features for detecting stuttering in speech; both within and across languages. We evaluate our method on Fluencybank and the German therapy-centric Kassel State of Fluency (KSoF) dataset by training Support Vector Machine classifiers using features extracted from the fine-tuned models for six different stuttering-related events types: blocks, ... : Submitted to Interspeech 2022 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering
|
|
URL: https://dx.doi.org/10.48550/arxiv.2204.03417 https://arxiv.org/abs/2204.03417
|
|
BASE
|
|
Hide details
|
|
64 |
Multi-sequence Intermediate Conditioning for CTC-based ASR ...
|
|
|
|
BASE
|
|
Show details
|
|
65 |
Code Switched and Code Mixed Speech Recognition for Indic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
67 |
Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
|
|
|
|
BASE
|
|
Show details
|
|
68 |
Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
69 |
MAESTRO: Matched Speech Text Representations through Modality Matching ...
|
|
|
|
BASE
|
|
Show details
|
|
72 |
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
75 |
Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features ...
|
|
|
|
BASE
|
|
Show details
|
|
76 |
Cochlear Implant Results in Older Adults with Post-Lingual Deafness: The Role of “Top-Down” Neurocognitive Mechanisms
|
|
|
|
In: International Journal of Environmental Research and Public Health; Volume 19; Issue 3; Pages: 1343 (2022)
|
|
BASE
|
|
Show details
|
|
77 |
MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension
|
|
|
|
In: Applied Sciences; Volume 12; Issue 2; Pages: 804 (2022)
|
|
BASE
|
|
Show details
|
|
78 |
On the Difference of Scoring in Speech in Babble Tests
|
|
|
|
In: Healthcare; Volume 10; Issue 3; Pages: 458 (2022)
|
|
BASE
|
|
Show details
|
|
79 |
An Empirical Performance Analysis of the Speak Correct Computerized Interface
|
|
|
|
In: Processes; Volume 10; Issue 3; Pages: 487 (2022)
|
|
BASE
|
|
Show details
|
|
80 |
DeepFry: Identifying Vocal Fry Using Deep Neural Networks ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 100
|
|