Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7 8...100

Hits 61 – 80 of 1.986

61	Filter-based Discriminative Autoencoders for Children Speech Recognition ...
	Tai, Chiang-Lin; Lee, Hung-Shin; Tsao, Yu. - : arXiv, 2022
	BASE
	Show details

62	Transducer-based language embedding for spoken language identification ...
	Shen, Peng; Lu, Xugang; Kawai, Hisashi. - : arXiv, 2022
	BASE
	Show details

63	Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 ...
	Bayerl, Sebastian P.; Wagner, Dominik; Nöth, Elmar. - : arXiv, 2022
	BASE
	Show details

64	Multi-sequence Intermediate Conditioning for CTC-based ASR ...
	Fujita, Yusuke; Komatsu, Tatsuya; Kida, Yusuke. - : arXiv, 2022
	BASE
	Show details

65	Code Switched and Code Mixed Speech Recognition for Indic languages ...
	Chadha, Harveen Singh; Shah, Priyanshi; Dhuriya, Ankur. - : arXiv, 2022
	BASE
	Show details

66	Simple and Effective Unsupervised Speech Synthesis ...
	Liu, Alexander H.; Lai, Cheng-I Jeff; Hsu, Wei-Ning. - : arXiv, 2022
	BASE
	Show details

67	Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
	Sankar, Sanjana; Beautemps, Denis; Hueber, Thomas. - : arXiv, 2022
	BASE
	Show details

68	Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
	Zhang, Cong; Zeng, Huinan; Liu, Huang. - : arXiv, 2022
	BASE
	Show details

69	MAESTRO: Matched Speech Text Representations through Modality Matching ...
	Chen, Zhehuai; Zhang, Yu; Rosenberg, Andrew. - : arXiv, 2022
	BASE
	Show details

70	Improving Language Identification of Accented Speech ...
	Kukk, Kunnar; Alumäe, Tanel. - : arXiv, 2022
	BASE
	Show details

71	Cross-stitched Multi-modal Encoders ...
	Singla, Karan; Pressel, Daniel; Price, Ryan. - : arXiv, 2022
	BASE
	Show details

72	Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
	Udagawa, Takuma; Suzuki, Masayuki; Kurata, Gakuto. - : arXiv, 2022
	BASE
	Show details

73	UK-South Korea Prosody Research Network ...
	Jeon, Hae-Sung. - : Open Science Framework, 2022
	BASE
	Show details

74	Speaker Extraction with Co-Speech Gestures Cue ...
	Pan, Zexu; Qian, Xinyuan; Li, Haizhou. - : arXiv, 2022
	BASE
	Show details

75	Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features ...
	Scharf, Maximilian Karl; Hochmuth, Sabine; Wong, Lena L. N.. - : arXiv, 2022
	BASE
	Show details

76	Cochlear Implant Results in Older Adults with Post-Lingual Deafness: The Role of “Top-Down” Neurocognitive Mechanisms
	Milena Zucca; Andrea Albera; Roberto Albera; Carla Montuschi; Beatrice Della Gatta; Andrea Canale; Innocenzo Rainero
	In: International Journal of Environmental Research and Public Health; Volume 19; Issue 3; Pages: 1343 (2022)
	BASE
	Show details

77	MLLP-VRAIN Spanish ASR Systems for the Albayzín-RTVE 2020 Speech-to-Text Challenge: Extension
	Pau Baquero-Arnal; Javier Jorge; Adrià Giménez; Javier Iranzo-Sánchez; Alejandro Pérez; Gonçal Vicent Garcés Díaz-Munío; Joan Albert Silvestre-Cerdà; Jorge Civera; Albert Sanchis; Alfons Juan
	In: Applied Sciences; Volume 12; Issue 2; Pages: 804 (2022)
	Abstract: This paper describes the automatic speech recognition (ASR) systems built by the MLLP-VRAIN research group of Universitat Politècnica de València for the Albayzín-RTVE 2020 Speech-to-Text Challenge, and includes an extension of the work consisting of building and evaluating equivalent systems under the closed data conditions from the 2018 challenge. The primary system (p-streaming_1500ms_nlt) was a hybrid ASR system using streaming one-pass decoding with a context window of 1.5 seconds. This system achieved 16.0% WER on the test-2020 set. We also submitted three contrastive systems. From these, we highlight the system c2-streaming_600ms_t which, following a similar configuration as the primary system with a smaller context window of 0.6 s, scored 16.9% WER points on the same test set, with a measured empirical latency of 0.81 ± 0.09 s (mean ± stdev). That is, we obtained state-of-the-art latencies for high-quality automatic live captioning with a small WER degradation of 6% relative. As an extension, the equivalent closed-condition systems obtained 23.3% WER and 23.5% WER, respectively. When evaluated with an unconstrained language model, we obtained 19.9% WER and 20.4% WER; i.e., not far behind the top-performing systems with only 5% of the full acoustic data and with the extra ability of being streaming-capable. Indeed, all of these streaming systems could be put into production environments for automatic captioning of live media streams.
	Keyword: automatic speech recognition; natural language processing; streaming
	URL: https://doi.org/10.3390/app12020804
	BASE
	Hide details

78	On the Difference of Scoring in Speech in Babble Tests
	Afroditi Sereti; Christos Sidiras; Nikos Eleftheriadis; Ioannis Nimatoudis; Gail D. Chermak; Vasiliki Maria Iliadou
	In: Healthcare; Volume 10; Issue 3; Pages: 458 (2022)
	BASE
	Show details

79	An Empirical Performance Analysis of the Speak Correct Computerized Interface
	Kamal Jambi; Hassanin Al-Barhamtoshy; Wajdi Al-Jedaibi; Mohsen Rashwan; Sherif Abdou
	In: Processes; Volume 10; Issue 3; Pages: 487 (2022)
	BASE
	Show details

80	DeepFry: Identifying Vocal Fry Using Deep Neural Networks ...
	Chernyak, Bronya R.; Simon, Talia Ben; Segal, Yael. - : arXiv, 2022
	BASE
	Show details

Page: 1 2 3 4 5 6 7 8...100

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern