Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7 8 9...429

Hits 81 – 100 of 8.571

81	Freeform Body Motion Generation from Speech ...
	Xu, Jing; Zhang, Wei; Bai, Yalong. - : arXiv, 2022
	BASE
	Show details

82	Conceptual Modeling of Events Based on One-Category Ontology ...
	Al-Fedaghi, Sabah. - : arXiv, 2022
	BASE
	Show details

83	Linguistic-Acoustic Similarity Based Accent Shift for Accent Recognition ...
	Shao, Qijie; Yan, Jinghao; Kang, Jian. - : arXiv, 2022
	BASE
	Show details

84	Automatic Speech Recognition Datasets in Cantonese: A Survey and New Dataset ...
	Yu, Tiezheng; Frieske, Rita; Xu, Peng. - : arXiv, 2022
	BASE
	Show details

85	WavThruVec: Latent speech representation as intermediate features for neural speech synthesis ...
	Siuzdak, Hubert; Dura, Piotr; van Rijn, Pol. - : arXiv, 2022
	BASE
	Show details

86	Knowledge Transfer from Large-scale Pretrained Language Models to End-to-end Speech Recognizers ...
	Kubo, Yotaro; Karita, Shigeki; Bacchiani, Michiel. - : arXiv, 2022
	BASE
	Show details

87	The VoicePrivacy 2022 Challenge Evaluation Plan ...
	Tomashenko, Natalia; Wang, Xin; Miao, Xiaoxiao. - : arXiv, 2022
	BASE
	Show details

88	A Character-level Span-based Model for Mandarin Prosodic Structure Prediction ...
	Chen, Xueyuan; Song, Changhe; Zhou, Yixuan. - : arXiv, 2022
	BASE
	Show details

89	Towards Property-Based Tests in Natural Language ...
	Gordon, Colin S.. - : arXiv, 2022
	BASE
	Show details

90	CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition ...
	Chen, Chengxin; Zhang, Pengyuan. - : arXiv, 2022
	Abstract: Previous research has looked into ways to improve speech emotion recognition (SER) by utilizing both acoustic and linguistic cues of speech. However, the potential association between state-of-the-art ASR models and the SER task has yet to be investigated. In this paper, we propose a novel channel and temporal-wise attention RNN (CTA-RNN) architecture based on the intermediate representations of pre-trained ASR models. Specifically, the embeddings of a large-scale pre-trained end-to-end ASR encoder contain both acoustic and linguistic information, as well as the ability to generalize to different speakers, making them well suited for downstream SER task. To further exploit the embeddings from different layers of the ASR encoder, we propose a novel CTA-RNN architecture to capture the emotional salient parts of embeddings in both the channel and temporal directions. We evaluate our approach on two popular benchmark datasets, IEMOCAP and MSP-IMPROV, using both within-corpus and cross-corpus settings. ... : 5 pages, 2 figures, submitted to INTERSPEECH 2022 ...
	Keyword: Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Sound cs.SD
	URL: https://dx.doi.org/10.48550/arxiv.2203.17023 https://arxiv.org/abs/2203.17023
	BASE
	Hide details

91	Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model ...
	Shen, Ying; Yang, Huiyu; Lin, Lin. - : arXiv, 2022
	BASE
	Show details

92	Fine-grained Noise Control for Multispeaker Speech Synthesis ...
	Nikitaras, Karolos; Vamvoukakis, Georgios; Ellinas, Nikolaos. - : arXiv, 2022
	BASE
	Show details

93	Emotion Intensity and its Control for Emotional Voice Conversion ...
	Zhou, Kun; Sisman, Berrak; Rana, Rajib. - : arXiv, 2022
	BASE
	Show details

94	Automatic Speech recognition for Speech Assessment of Preschool Children ...
	Abaskohi, Amirhossein; Mortazavi, Fatemeh; Moradi, Hadi. - : arXiv, 2022
	BASE
	Show details

95	The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge ...
	Chen, Ziyi; Hua, Hua; Zhang, Yuxiang. - : arXiv, 2022
	BASE
	Show details

96	Dawn of the transformer era in speech emotion recognition: closing the valence gap ...
	Wagner, Johannes; Triantafyllopoulos, Andreas; Wierstorf, Hagen. - : arXiv, 2022
	BASE
	Show details

97	Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents ...
	Dubey, Priyank; Shah, Bilal. - : arXiv, 2022
	BASE
	Show details

98	KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics ...
	Mussakhojayeva, Saida; Khassanov, Yerbolat; Varol, Huseyin Atakan. - : arXiv, 2022
	BASE
	Show details

99	Automated speech tools for helping communities process restricted-access corpora for language revival efforts ...
	San, Nay; Bartelds, Martijn; Ògúnrèmí, Tolúlopé. - : arXiv, 2022
	BASE
	Show details

100	Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach ...
	Chi, Nathan A.; Washington, Peter; Kline, Aaron. - : arXiv, 2022
	BASE
	Show details

Page: 1 2 3 4 5 6 7 8 9...429

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern