Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7 8 9...50

Hits 81 – 100 of 989

81	Common Phone: A Multilingual Dataset for Robust Acoustic Modelling ...
	Klumpp, Philipp; Arias-Vergara, Tomás; Pérez-Toro, Paula Andrea. - : arXiv, 2022
	BASE
	Show details

82	Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end ...
	Hida, Rem; Hamada, Masaki; Kamada, Chie. - : arXiv, 2022
	BASE
	Show details

83	Low-dimensional representation of infant and adult vocalization acoustics ...
	Pagliarini, Silvia; Schneider, Sara; Kello, Christopher T.. - : arXiv, 2022
	BASE
	Show details

84	Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character ...
	Yang, Zhao; Xi, Wei; Wang, Rui. - : arXiv, 2022
	BASE
	Show details

85	Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives ...
	Sadhu, Samik; Hermansky, Hynek. - : arXiv, 2022
	BASE
	Show details

86	Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition ...
	Ma, Guodong; Hu, Pengfei; Kang, Jian. - : arXiv, 2022
	BASE
	Show details

87	Similarity and Content-based Phonetic Self Attention for Speech Recognition ...
	Shim, Kyuhong; Sung, Wonyong. - : arXiv, 2022
	BASE
	Show details

88	BERT-LID: Leveraging BERT to Improve Spoken Language Identification ...
	Nie, Yuting; Zhao, Junhong; Zhang, Wei-Qiang. - : arXiv, 2022
	BASE
	Show details

89	Chain-based Discriminative Autoencoders for Speech Recognition ...
	Lee, Hung-Shin; Huang, Pin-Tuan; Cheng, Yao-Fei. - : arXiv, 2022
	BASE
	Show details

90	Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis ...
	Wang, Zexun; Le, Yuquan; Zhu, Yi; Zhao, Yuming; Feng, Mingchao; Chen, Meng; He, Xiaodong. - : arXiv, 2022
	Abstract: Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants. Considering that most ASR errors are caused by phonetic confusion between similar-sounding expressions, intuitively, leveraging the phoneme sequence of speech can complement ASR hypothesis and enhance the robustness of SLU. This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). The cross attention block is devised to catch the fine-grained interactions between phoneme and word embeddings in order to make the joint representations catch the phonetic and semantic features of input simultaneously and for overcoming the ASR errors in downstream natural language understanding (NLU) tasks. Extensive experiments are conducted on three datasets, showing the effectiveness and competitiveness of our approach. Additionally, We also validate the universality of CASLU and prove its complementarity when combining with other robust ... : ICASSP 2022 ...
	Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
	URL: https://arxiv.org/abs/2203.12067 https://dx.doi.org/10.48550/arxiv.2203.12067
	BASE
	Hide details

91	STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation ...
	Naeem, Saad; Beg, Omer. - : arXiv, 2022
	BASE
	Show details

92	Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model ...
	Wang, Nick J. C.; Wang, Lu; Sun, Yandan. - : arXiv, 2022
	BASE
	Show details

93	Speech segmentation using multilevel hybrid filters ...
	Faundez-Zanuy, Marcos; Vallverdu-Bayes, Francesc. - : arXiv, 2022
	BASE
	Show details

94	On the relevance of language in speaker recognition ...
	Satue-Villar, Antonio; Faundez-Zanuy, Marcos. - : arXiv, 2022
	BASE
	Show details

95	Improving speaker de-identification with functional data analysis of f0 trajectories ...
	Tavi, Lauri; Kinnunen, Tomi; Hautamäki, Rosa González. - : arXiv, 2022
	BASE
	Show details

96	Unsupervised word-level prosody tagging for controllable speech synthesis ...
	Guo, Yiwei; Du, Chenpeng; Yu, Kai. - : arXiv, 2022
	BASE
	Show details

97	Filter-based Discriminative Autoencoders for Children Speech Recognition ...
	Tai, Chiang-Lin; Lee, Hung-Shin; Tsao, Yu. - : arXiv, 2022
	BASE
	Show details

98	Transducer-based language embedding for spoken language identification ...
	Shen, Peng; Lu, Xugang; Kawai, Hisashi. - : arXiv, 2022
	BASE
	Show details

99	Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 ...
	Bayerl, Sebastian P.; Wagner, Dominik; Nöth, Elmar. - : arXiv, 2022
	BASE
	Show details

100	Multi-sequence Intermediate Conditioning for CTC-based ASR ...
	Fujita, Yusuke; Komatsu, Tatsuya; Kida, Yusuke. - : arXiv, 2022
	BASE
	Show details

Page: 1 2 3 4 5 6 7 8 9...50

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern