Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7 8 9...46

Hits 81 – 100 of 906

81	Unsupervised word-level prosody tagging for controllable speech synthesis ...
	Guo, Yiwei; Du, Chenpeng; Yu, Kai. - : arXiv, 2022
	BASE
	Show details

82	Filter-based Discriminative Autoencoders for Children Speech Recognition ...
	Tai, Chiang-Lin; Lee, Hung-Shin; Tsao, Yu. - : arXiv, 2022
	BASE
	Show details

83	Transducer-based language embedding for spoken language identification ...
	Shen, Peng; Lu, Xugang; Kawai, Hisashi. - : arXiv, 2022
	Abstract: The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper, we propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefiting from the advantages of the RNN transducer's linguistic representation capability, the proposed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. Experiments were carried out on the large-scale multilingual LibriSpeech and VoxLingua107 datasets. Experimental results showed the proposed method significantly improves the performance on LID tasks with 12% to 59% and 16% to 24% relative improvement on in-domain and cross-domain datasets, respectively. ... : This paper was submitted to Interspeech 2022 ...
	Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
	URL: https://dx.doi.org/10.48550/arxiv.2204.03888 https://arxiv.org/abs/2204.03888
	BASE
	Hide details

84	Multi-sequence Intermediate Conditioning for CTC-based ASR ...
	Fujita, Yusuke; Komatsu, Tatsuya; Kida, Yusuke. - : arXiv, 2022
	BASE
	Show details

85	Simple and Effective Unsupervised Speech Synthesis ...
	Liu, Alexander H.; Lai, Cheng-I Jeff; Hsu, Wei-Ning. - : arXiv, 2022
	BASE
	Show details

86	Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
	Sankar, Sanjana; Beautemps, Denis; Hueber, Thomas. - : arXiv, 2022
	BASE
	Show details

87	Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
	Zhang, Cong; Zeng, Huinan; Liu, Huang. - : arXiv, 2022
	BASE
	Show details

88	Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
	Peng, Puyuan; Harwath, David. - : arXiv, 2022
	BASE
	Show details

89	CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations ...
	Sachidananda, Vin; Tseng, Shao-Yen; Marchi, Erik. - : arXiv, 2022
	BASE
	Show details

90	Enhance Language Identification using Dual-mode Model with Knowledge Distillation ...
	Liu, Hexin; Perera, Leibny Paola Garcia; Khong, Andy W. H.. - : arXiv, 2022
	BASE
	Show details

91	MAESTRO: Matched Speech Text Representations through Modality Matching ...
	Chen, Zhehuai; Zhang, Yu; Rosenberg, Andrew. - : arXiv, 2022
	BASE
	Show details

92	Cross-stitched Multi-modal Encoders ...
	Singla, Karan; Pressel, Daniel; Price, Ryan. - : arXiv, 2022
	BASE
	Show details

93	Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
	Udagawa, Takuma; Suzuki, Masayuki; Kurata, Gakuto. - : arXiv, 2022
	BASE
	Show details

94	ASR-Aware End-to-end Neural Diarization ...
	Khare, Aparna; Han, Eunjung; Yang, Yuguang. - : arXiv, 2022
	BASE
	Show details

95	Wavebender GAN: An architecture for phonetically meaningful speech manipulation ...
	Beck, Gustavo Teodoro Döhler; Wennberg, Ulme; Malisz, Zofia. - : arXiv, 2022
	BASE
	Show details

96	Speaker Extraction with Co-Speech Gestures Cue ...
	Pan, Zexu; Qian, Xinyuan; Li, Haizhou. - : arXiv, 2022
	BASE
	Show details

97	Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features ...
	Scharf, Maximilian Karl; Hochmuth, Sabine; Wong, Lena L. N.. - : arXiv, 2022
	BASE
	Show details

98	MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data ...
	Close, George; Hain, Thomas; Goetze, Stefan. - : arXiv, 2022
	BASE
	Show details

99	DeepFry: Identifying Vocal Fry Using Deep Neural Networks ...
	Chernyak, Bronya R.; Simon, Talia Ben; Segal, Yael. - : arXiv, 2022
	BASE
	Show details

100	MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis ...
	Lei, Yi; Yang, Shan; Wang, Xinsheng. - : arXiv, 2022
	BASE
	Show details

Page: 1 2 3 4 5 6 7 8 9...46

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern