Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7 8 9...46

Hits 81 – 100 of 906

81	Unsupervised word-level prosody tagging for controllable speech synthesis ...
	Guo, Yiwei; Du, Chenpeng; Yu, Kai. - : arXiv, 2022
	BASE
	Show details

82	Filter-based Discriminative Autoencoders for Children Speech Recognition ...
	Tai, Chiang-Lin; Lee, Hung-Shin; Tsao, Yu. - : arXiv, 2022
	BASE
	Show details

83	Transducer-based language embedding for spoken language identification ...
	Shen, Peng; Lu, Xugang; Kawai, Hisashi. - : arXiv, 2022
	BASE
	Show details

84	Multi-sequence Intermediate Conditioning for CTC-based ASR ...
	Fujita, Yusuke; Komatsu, Tatsuya; Kida, Yusuke. - : arXiv, 2022
	Abstract: End-to-end automatic speech recognition (ASR) directly maps input speech to a character sequence without using pronunciation lexica. However, in languages with thousands of characters, such as Japanese and Mandarin, modeling all these characters is problematic due to data scarcity. To alleviate the problem, we propose a multi-task learning model with explicit interaction between characters and syllables by utilizing Self-conditioned connectionist temporal classification (CTC) technique. While the original Self-conditioned CTC estimates character-level intermediate predictions by applying auxiliary CTC losses to a set of intermediate layers, the proposed method additionally estimates syllable-level intermediate predictions in another set of intermediate layers. The character-level and syllable-level predictions are alternately used as conditioning features to deal with mutual dependency between characters and syllables. Experimental results on Japanese and Mandarin datasets show that the proposed ... : This paper was submitted to INTERSPEECH 2022 ...
	Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
	URL: https://arxiv.org/abs/2204.00175 https://dx.doi.org/10.48550/arxiv.2204.00175
	BASE
	Hide details

85	Simple and Effective Unsupervised Speech Synthesis ...
	Liu, Alexander H.; Lai, Cheng-I Jeff; Hsu, Wei-Ning. - : arXiv, 2022
	BASE
	Show details

86	Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
	Sankar, Sanjana; Beautemps, Denis; Hueber, Thomas. - : arXiv, 2022
	BASE
	Show details

87	Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
	Zhang, Cong; Zeng, Huinan; Liu, Huang. - : arXiv, 2022
	BASE
	Show details

88	Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
	Peng, Puyuan; Harwath, David. - : arXiv, 2022
	BASE
	Show details

89	CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations ...
	Sachidananda, Vin; Tseng, Shao-Yen; Marchi, Erik. - : arXiv, 2022
	BASE
	Show details

90	Enhance Language Identification using Dual-mode Model with Knowledge Distillation ...
	Liu, Hexin; Perera, Leibny Paola Garcia; Khong, Andy W. H.. - : arXiv, 2022
	BASE
	Show details

91	MAESTRO: Matched Speech Text Representations through Modality Matching ...
	Chen, Zhehuai; Zhang, Yu; Rosenberg, Andrew. - : arXiv, 2022
	BASE
	Show details

92	Cross-stitched Multi-modal Encoders ...
	Singla, Karan; Pressel, Daniel; Price, Ryan. - : arXiv, 2022
	BASE
	Show details

93	Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
	Udagawa, Takuma; Suzuki, Masayuki; Kurata, Gakuto. - : arXiv, 2022
	BASE
	Show details

94	ASR-Aware End-to-end Neural Diarization ...
	Khare, Aparna; Han, Eunjung; Yang, Yuguang. - : arXiv, 2022
	BASE
	Show details

95	Wavebender GAN: An architecture for phonetically meaningful speech manipulation ...
	Beck, Gustavo Teodoro Döhler; Wennberg, Ulme; Malisz, Zofia. - : arXiv, 2022
	BASE
	Show details

96	Speaker Extraction with Co-Speech Gestures Cue ...
	Pan, Zexu; Qian, Xinyuan; Li, Haizhou. - : arXiv, 2022
	BASE
	Show details

97	Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features ...
	Scharf, Maximilian Karl; Hochmuth, Sabine; Wong, Lena L. N.. - : arXiv, 2022
	BASE
	Show details

98	MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data ...
	Close, George; Hain, Thomas; Goetze, Stefan. - : arXiv, 2022
	BASE
	Show details

99	DeepFry: Identifying Vocal Fry Using Deep Neural Networks ...
	Chernyak, Bronya R.; Simon, Talia Ben; Segal, Yael. - : arXiv, 2022
	BASE
	Show details

100	MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis ...
	Lei, Yi; Yang, Shan; Wang, Xinsheng. - : arXiv, 2022
	BASE
	Show details

Page: 1 2 3 4 5 6 7 8 9...46

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern