Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5...7

Hits 1 – 20 of 124

1	Subspace-based Representation and Learning for Phonotactic Spoken Language Recognition ...
	Lee, Hung-Shin; Tsao, Yu; Jeng, Shyh-Kang. - : arXiv, 2022
	BASE
	Show details

2	Multilingual and Multimodal Abuse Detection ...
	Sharon, Rini; Shah, Heet; Mukherjee, Debdoot. - : arXiv, 2022
	BASE
	Show details

3	WavThruVec: Latent speech representation as intermediate features for neural speech synthesis ...
	Siuzdak, Hubert; Dura, Piotr; van Rijn, Pol. - : arXiv, 2022
	BASE
	Show details

4	Fine-grained Noise Control for Multispeaker Speech Synthesis ...
	Nikitaras, Karolos; Vamvoukakis, Georgios; Ellinas, Nikolaos. - : arXiv, 2022
	BASE
	Show details

5	Emotion Intensity and its Control for Emotional Voice Conversion ...
	Zhou, Kun; Sisman, Berrak; Rana, Rajib. - : arXiv, 2022
	BASE
	Show details

6	Low-dimensional representation of infant and adult vocalization acoustics ...
	Pagliarini, Silvia; Schneider, Sara; Kello, Christopher T.. - : arXiv, 2022
	BASE
	Show details

7	Chain-based Discriminative Autoencoders for Speech Recognition ...
	Lee, Hung-Shin; Huang, Pin-Tuan; Cheng, Yao-Fei; Wang, Hsin-Min. - : arXiv, 2022
	Abstract: In our previous work, we proposed a discriminative autoencoder (DcAE) for speech recognition. DcAE combines two training schemes into one. First, since DcAE aims to learn encoder-decoder mappings, the squared error between the reconstructed speech and the input speech is minimized. Second, in the code layer, frame-based phonetic embeddings are obtained by minimizing the categorical cross-entropy between ground truth labels and predicted triphone-state scores. DcAE is developed based on the Kaldi toolkit by treating various TDNN models as encoders. In this paper, we further propose three new versions of DcAE. First, a new objective function that considers both categorical cross-entropy and mutual information between ground truth and predicted triphone-state sequences is used. The resulting DcAE is called a chain-based DcAE (c-DcAE). For application to robust speech recognition, we further extend c-DcAE to hierarchical and parallel structures, resulting in hc-DcAE and pc-DcAE. In these two models, both the ... : Submitted to Interspeech 2022 ...
	Keyword: Artificial Intelligence cs.AI; Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning cs.LG; Multimedia cs.MM; Sound cs.SD
	URL: https://dx.doi.org/10.48550/arxiv.2203.13687 https://arxiv.org/abs/2203.13687
	BASE
	Hide details

8	Filter-based Discriminative Autoencoders for Children Speech Recognition ...
	Tai, Chiang-Lin; Lee, Hung-Shin; Tsao, Yu. - : arXiv, 2022
	BASE
	Show details

9	Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
	Peng, Puyuan; Harwath, David. - : arXiv, 2022
	BASE
	Show details

10	Continual Learning for Monolingual End-to-End Automatic Speech Recognition ...
	Eeckt, Steven Vander; Van hamme, Hugo. - : arXiv, 2021
	BASE
	Show details

11	Cetacean Translation Initiative: a roadmap to deciphering the communication of sperm whales ...
	Andreas, Jacob; Beguš, Gašper; Bronstein, Michael M.. - : arXiv, 2021
	BASE
	Show details

12	Speech Representations and Phoneme Classification for Preserving the Endangered Language of Ladin ...
	Durante, Zane; Mathur, Leena; Ye, Eric. - : arXiv, 2021
	BASE
	Show details

13	Applying Phonological Features in Multilingual Text-To-Speech ...
	Zhang, Cong; Zeng, Huinan; Liu, Huang. - : arXiv, 2021
	BASE
	Show details

14	English Accent Accuracy Analysis in a State-of-the-Art Automatic Speech Recognition System ...
	Cámbara, Guillermo; Peiró-Lilja, Alex; Farrús, Mireia. - : arXiv, 2021
	BASE
	Show details

15	Cross-lingual Low Resource Speaker Adaptation Using Phonological Features ...
	Maniati, Georgia; Ellinas, Nikolaos; Markopoulos, Konstantinos. - : arXiv, 2021
	BASE
	Show details

16	Arabic Speech Recognition by End-to-End, Modular Systems and Human ...
	Hussein, Amir; Watanabe, Shinji; Ali, Ahmed. - : arXiv, 2021
	BASE
	Show details

17	The INTERSPEECH 2021 Computational Paralinguistics Challenge: COVID-19 Cough, COVID-19 Speech, Escalation & Primates ...
	Schuller, Björn W.; Batliner, Anton; Bergler, Christian. - : arXiv, 2021
	BASE
	Show details

18	Discrete representations in neural models of spoken language ...
	Higy, Bertrand; Gelderloos, Lieke; Alishahi, Afra. - : arXiv, 2021
	BASE
	Show details

19	Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis ...
	Futamata, Kosuke; Park, Byeongseon; Yamamoto, Ryuichi. - : arXiv, 2021
	BASE
	Show details

20	Learning De-identified Representations of Prosody from Raw Audio ...
	Weston, Jack; Lenain, Raphael; Meepegama, Udeepa. - : arXiv, 2021
	BASE
	Show details

Page: 1 2 3 4 5...7

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern