Page: 1 2 3 4 5 6 7 8... 100
61 |
Filter-based Discriminative Autoencoders for Children Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
62 |
Transducer-based language embedding for spoken language identification ...
|
|
|
|
BASE
|
|
Show details
|
|
63 |
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 ...
|
|
|
|
BASE
|
|
Show details
|
|
64 |
Multi-sequence Intermediate Conditioning for CTC-based ASR ...
|
|
|
|
BASE
|
|
Show details
|
|
65 |
Code Switched and Code Mixed Speech Recognition for Indic languages ...
|
|
|
|
BASE
|
|
Show details
|
|
67 |
Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
|
|
|
|
BASE
|
|
Show details
|
|
68 |
Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
69 |
MAESTRO: Matched Speech Text Representations through Modality Matching ...
|
|
|
|
BASE
|
|
Show details
|
|
72 |
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
75 |
Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features ...
|
|
|
|
BASE
|
|
Show details
|
|
76 |
Cochlear Implant Results in Older Adults with Post-Lingual Deafness: The Role of “Top-Down” Neurocognitive Mechanisms
|
|
|
|
In: International Journal of Environmental Research and Public Health; Volume 19; Issue 3; Pages: 1343 (2022)
|
|
BASE
|
|
Show details
|
|
77 |
MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension
|
|
|
|
In: Applied Sciences; Volume 12; Issue 2; Pages: 804 (2022)
|
|
Abstract:
This paper describes the automatic speech recognition (ASR) systems built by the MLLP-VRAIN research group of Universitat Politècnica de València for the Albayzín-RTVE 2020 Speech-to-Text Challenge, and includes an extension of the work consisting of building and evaluating equivalent systems under the closed data conditions from the 2018 challenge. The primary system (p-streaming_1500ms_nlt) was a hybrid ASR system using streaming one-pass decoding with a context window of 1.5 seconds. This system achieved 16.0% WER on the test-2020 set. We also submitted three contrastive systems. From these, we highlight the system c2-streaming_600ms_t which, following a similar configuration as the primary system with a smaller context window of 0.6 s, scored 16.9% WER points on the same test set, with a measured empirical latency of 0.81 ± 0.09 s (mean ± stdev). That is, we obtained state-of-the-art latencies for high-quality automatic live captioning with a small WER degradation of 6% relative. As an extension, the equivalent closed-condition systems obtained 23.3% WER and 23.5% WER, respectively. When evaluated with an unconstrained language model, we obtained 19.9% WER and 20.4% WER; i.e., not far behind the top-performing systems with only 5% of the full acoustic data and with the extra ability of being streaming-capable. Indeed, all of these streaming systems could be put into production environments for automatic captioning of live media streams.
|
|
Keyword:
automatic speech recognition; natural language processing; streaming
|
|
URL: https://doi.org/10.3390/app12020804
|
|
BASE
|
|
Hide details
|
|
78 |
On the Difference of Scoring in Speech in Babble Tests
|
|
|
|
In: Healthcare; Volume 10; Issue 3; Pages: 458 (2022)
|
|
BASE
|
|
Show details
|
|
79 |
An Empirical Performance Analysis of the Speak Correct Computerized Interface
|
|
|
|
In: Processes; Volume 10; Issue 3; Pages: 487 (2022)
|
|
BASE
|
|
Show details
|
|
80 |
DeepFry: Identifying Vocal Fry Using Deep Neural Networks ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 100
|
|