Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7...50

Hits 41 – 60 of 989

41	Unsupervised Data Selection via Discrete Speech Representation for ASR ...
	Lu, Zhiyun; Wang, Yongqiang; Zhang, Yu. - : arXiv, 2022
	BASE
	Show details

42	Analysis of Voice Conversion and Code-Switching Synthesis Using VQ-VAE ...
	Das, Shuvayanti; Williams, Jennifer; Lai, Catherine. - : arXiv, 2022
	BASE
	Show details

43	CVSS Corpus and Massively Multilingual Speech-to-Speech Translation ...
	Jia, Ye; Ramanovich, Michelle Tadmor; Wang, Quan. - : arXiv, 2022
	BASE
	Show details

44	ADIMA: Abuse Detection In Multilingual Audio ...
	Gupta, Vikram; Sharon, Rini; Sawhney, Ramit. - : arXiv, 2022
	BASE
	Show details

45	Improving the fusion of acoustic and text representations in RNN-T ...
	Zhang, Chao; Li, Bo; Lu, Zhiyun. - : arXiv, 2022
	BASE
	Show details

46	Data and knowledge-driven approaches for multilingual training to improve the performance of speech recognition systems of Indian languages ...
	Madhavaraj, A.; Ganesan, Ramakrishnan Angarai. - : arXiv, 2022
	BASE
	Show details

47	Frequency-Directional Attention Model for Multilingual Automatic Speech Recognition ...
	Dobashi, Akihiro; Leow, Chee Siang; Nishizaki, Hiromitsu. - : arXiv, 2022
	BASE
	Show details

48	Tackling data scarcity in speech translation using zero-shot multilingual machine translation techniques ...
	Dinh, Tu Anh; Liu, Danni; Niehues, Jan. - : arXiv, 2022
	BASE
	Show details

49	AVQVC: One-shot Voice Conversion by Vector Quantization with applying contrastive learning ...
	Tang, Huaizhen; Zhang, Xulong; Wang, Jianzong. - : arXiv, 2022
	BASE
	Show details

50	Multimodal Clustering with Role Induced Constraints for Speaker Diarization ...
	Flemotomos, Nikolaos; Narayanan, Shrikanth. - : arXiv, 2022
	BASE
	Show details

51	Cross-view Brain Decoding ...
	Oota, Subba Reddy; Arora, Jashn; Gupta, Manish. - : arXiv, 2022
	BASE
	Show details

52	Freeform Body Motion Generation from Speech ...
	Xu, Jing; Zhang, Wei; Bai, Yalong. - : arXiv, 2022
	BASE
	Show details

53	Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition ...
	Shao, Qijie; Yan, Jinghao; Kang, Jian. - : arXiv, 2022
	BASE
	Show details

54	Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset ...
	Yu, Tiezheng; Frieske, Rita; Xu, Peng. - : arXiv, 2022
	BASE
	Show details

55	WavThruVec: Latent speech representation as intermediate features for neural speech synthesis ...
	Siuzdak, Hubert; Dura, Piotr; van Rijn, Pol. - : arXiv, 2022
	BASE
	Show details

56	Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers ...
	Kubo, Yotaro; Karita, Shigeki; Bacchiani, Michiel. - : arXiv, 2022
	BASE
	Show details

57	The VoicePrivacy 2022 Challenge Evaluation Plan ...
	Tomashenko, Natalia; Wang, Xin; Miao, Xiaoxiao. - : arXiv, 2022
	BASE
	Show details

58	A Character-level Span-based Model for Mandarin Prosodic Structure Prediction ...
	Chen, Xueyuan; Song, Changhe; Zhou, Yixuan; Wu, Zhiyong; Chen, Changbin; Wu, Zhongqin; Meng, Helen. - : arXiv, 2022
	Abstract: The accuracy of prosodic structure prediction is crucial to the naturalness of synthesized speech in Mandarin text-to-speech system, but now is limited by widely-used sequence-to-sequence framework and error accumulation from previous word segmentation results. In this paper, we propose a span-based Mandarin prosodic structure prediction model to obtain an optimal prosodic structure tree, which can be converted to corresponding prosodic label sequence. Instead of the prerequisite for word segmentation, rich linguistic features are provided by Chinese character-level BERT and sent to encoder with self-attention architecture. On top of this, span representation and label scoring are used to describe all possible prosodic structure trees, of which each tree has its corresponding score. To find the optimal tree with the highest score for a given sentence, a bottom-up CKY-style algorithm is further used. The proposed method can predict prosodic labels of different levels at the same time and accomplish the ... : Accepted by ICASSP 2022 ...
	Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
	URL: https://arxiv.org/abs/2203.16922 https://dx.doi.org/10.48550/arxiv.2203.16922
	BASE
	Hide details

59	CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition ...
	Chen, Chengxin; Zhang, Pengyuan. - : arXiv, 2022
	BASE
	Show details

60	Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model ...
	Shen, Ying; Yang, Huiyu; Lin, Lin. - : arXiv, 2022
	BASE
	Show details

Page: 1 2 3 4 5 6 7...50

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern