Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7 8 9...292

Hits 81 – 100 of 5.836

81	Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model ...
	Wang, Nick J. C.; Wang, Lu; Sun, Yandan; Kang, Haimei; Zhang, Dejun. - : arXiv, 2022
	Abstract: In spoken language understanding (SLU), what the user says is converted to his/her intent. Recent work on end-to-end SLU has shown that accuracy can be improved via pre-training approaches. We revisit ideas presented by Lugosch et al. using speech pre-training and three-module modeling; however, to ease construction of the end-to-end SLU model, we use as our phoneme module an open-source acoustic-phonetic model from a DNN-HMM hybrid automatic speech recognition (ASR) system instead of training one from scratch. Hence we fine-tune on speech only for the word module, and we apply multi-target learning (MTL) on the word and intent modules to jointly optimize SLU performance. MTL yields a relative reduction of 40% in intent-classification error rates (from 1.0% to 0.6%). Note that our three-module model is a streaming method. The final outcome of the proposed three-module modeling approach yields an intent accuracy of 99.4% on FluentSpeech, an intent error rate reduction of 50% compared to that of Lugosch et al. ... : Published in INTERSPEECH 2021 ...
	Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
	URL: https://arxiv.org/abs/2204.03315 https://dx.doi.org/10.48550/arxiv.2204.03315
	BASE
	Hide details

82	Speech segmentation using multilevel hybrid filters ...
	Faundez-Zanuy, Marcos; Vallverdu-Bayes, Francesc. - : arXiv, 2022
	BASE
	Show details

83	On the relevance of language in speaker recognition ...
	Satue-Villar, Antonio; Faundez-Zanuy, Marcos. - : arXiv, 2022
	BASE
	Show details

84	Improving speaker de-identification with functional data analysis of f0 trajectories ...
	Tavi, Lauri; Kinnunen, Tomi; Hautamäki, Rosa González. - : arXiv, 2022
	BASE
	Show details

85	Unsupervised word-level prosody tagging for controllable speech synthesis ...
	Guo, Yiwei; Du, Chenpeng; Yu, Kai. - : arXiv, 2022
	BASE
	Show details

86	Filter-based Discriminative Autoencoders for Children Speech Recognition ...
	Tai, Chiang-Lin; Lee, Hung-Shin; Tsao, Yu. - : arXiv, 2022
	BASE
	Show details

87	Transducer-based language embedding for spoken language identification ...
	Shen, Peng; Lu, Xugang; Kawai, Hisashi. - : arXiv, 2022
	BASE
	Show details

88	Effects of Spatial Speech Presentation on Listener Response Strategy for Talker-Identification ...
	Uhrig, Stefan; Perkis, Andrew; Möller, Sebastian. - : Technische Universität Berlin, 2022
	BASE
	Show details

89	Audiovisual Maluma/Takete Effect ...
	Sidhu, David. - : Open Science Framework, 2022
	BASE
	Show details

90	Multi-sequence Intermediate Conditioning for CTC-based ASR ...
	Fujita, Yusuke; Komatsu, Tatsuya; Kida, Yusuke. - : arXiv, 2022
	BASE
	Show details

91	Simple and Effective Unsupervised Speech Synthesis ...
	Liu, Alexander H.; Lai, Cheng-I Jeff; Hsu, Wei-Ning. - : arXiv, 2022
	BASE
	Show details

92	Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
	Sankar, Sanjana; Beautemps, Denis; Hueber, Thomas. - : arXiv, 2022
	BASE
	Show details

93	Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
	Zhang, Cong; Zeng, Huinan; Liu, Huang. - : arXiv, 2022
	BASE
	Show details

94	Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
	Peng, Puyuan; Harwath, David. - : arXiv, 2022
	BASE
	Show details

95	CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations ...
	Sachidananda, Vin; Tseng, Shao-Yen; Marchi, Erik. - : arXiv, 2022
	BASE
	Show details

96	Enhance Language Identification using Dual-mode Model with Knowledge Distillation ...
	Liu, Hexin; Perera, Leibny Paola Garcia; Khong, Andy W. H.. - : arXiv, 2022
	BASE
	Show details

97	MAESTRO: Matched Speech Text Representations through Modality Matching ...
	Chen, Zhehuai; Zhang, Yu; Rosenberg, Andrew. - : arXiv, 2022
	BASE
	Show details

98	Cross-stitched Multi-modal Encoders ...
	Singla, Karan; Pressel, Daniel; Price, Ryan. - : arXiv, 2022
	BASE
	Show details

99	Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
	Udagawa, Takuma; Suzuki, Masayuki; Kurata, Gakuto. - : arXiv, 2022
	BASE
	Show details

100	ASR-Aware End-to-end Neural Diarization ...
	Khare, Aparna; Han, Eunjung; Yang, Yuguang. - : arXiv, 2022
	BASE
	Show details

Page: 1 2 3 4 5 6 7 8 9...292

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern