Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	PHO-LID: A Unified Model Incorporating Acoustic-Phonetic and Phonotactic Information for Language Identification ...
	Liu, Hexin; Perera, Leibny Paola Garcia; Khong, Andy W. H.; Styles, Suzy J.; Khudanpur, Sanjeev. - : arXiv, 2022
	Abstract: We propose a novel model to hierarchically incorporate phoneme and phonotactic information for language identification (LID) without requiring phoneme annotations for training. In this model, named PHO-LID, a self-supervised phoneme segmentation task and a LID task share a convolutional neural network (CNN) module, which encodes both language identity and sequential phonemic information in the input speech to generate an intermediate sequence of phonotactic embeddings. These embeddings are then fed into transformer encoder layers for utterance-level LID. We call this architecture CNN-Trans. We evaluate it on AP17-OLR data and the MLS14 set of NIST LRE 2017, and show that the PHO-LID model with multi-task optimization exhibits the highest LID performance among all models, achieving over 40% relative improvement in terms of average cost on AP17-OLR data compared to a CNN-Trans model optimized only for LID. The visualized confusion matrices imply that our proposed method achieves higher performance on languages ... : Submitted to Interspeech 2022, updated to the submitted version ...
	Keyword: Audio and Speech Processing eess.AS; FOS Electrical engineering, electronic engineering, information engineering
	URL: https://arxiv.org/abs/2203.12366 https://dx.doi.org/10.48550/arxiv.2203.12366
	BASE
	Hide details

2	Enhance Language Identification using Dual-mode Model with Knowledge Distillation ...
	Liu, Hexin; Perera, Leibny Paola Garcia; Khong, Andy W. H.. - : arXiv, 2022
	BASE
	Show details

Search in the Catalogues and Directories