DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5 6 7 8 9...46
Hits 81 – 100 of 906

81
Unsupervised word-level prosody tagging for controllable speech synthesis ...
Guo, Yiwei; Du, Chenpeng; Yu, Kai. - : arXiv, 2022
BASE
Show details
82
Filter-based Discriminative Autoencoders for Children Speech Recognition ...
BASE
Show details
83
Transducer-based language embedding for spoken language identification ...
Shen, Peng; Lu, Xugang; Kawai, Hisashi. - : arXiv, 2022
Abstract: The acoustic and linguistic features are important cues for the spoken language identification (LID) task. Recent advanced LID systems mainly use acoustic features that lack the usage of explicit linguistic feature encoding. In this paper, we propose a novel transducer-based language embedding approach for LID tasks by integrating an RNN transducer model into a language embedding framework. Benefiting from the advantages of the RNN transducer's linguistic representation capability, the proposed method can exploit both phonetically-aware acoustic features and explicit linguistic features for LID tasks. Experiments were carried out on the large-scale multilingual LibriSpeech and VoxLingua107 datasets. Experimental results showed the proposed method significantly improves the performance on LID tasks with 12% to 59% and 16% to 24% relative improvement on in-domain and cross-domain datasets, respectively. ... : This paper was submitted to Interspeech 2022 ...
Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
URL: https://dx.doi.org/10.48550/arxiv.2204.03888
https://arxiv.org/abs/2204.03888
BASE
Hide details
84
Multi-sequence Intermediate Conditioning for CTC-based ASR ...
BASE
Show details
85
Simple and Effective Unsupervised Speech Synthesis ...
BASE
Show details
86
Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
BASE
Show details
87
Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
Zhang, Cong; Zeng, Huinan; Liu, Huang. - : arXiv, 2022
BASE
Show details
88
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
Peng, Puyuan; Harwath, David. - : arXiv, 2022
BASE
Show details
89
CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations ...
BASE
Show details
90
Enhance Language Identification using Dual-mode Model with Knowledge Distillation ...
BASE
Show details
91
MAESTRO: Matched Speech Text Representations through Modality Matching ...
BASE
Show details
92
Cross-stitched Multi-modal Encoders ...
BASE
Show details
93
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
BASE
Show details
94
ASR-Aware End-to-end Neural Diarization ...
BASE
Show details
95
Wavebender GAN: An architecture for phonetically meaningful speech manipulation ...
BASE
Show details
96
Speaker Extraction with Co-Speech Gestures Cue ...
Pan, Zexu; Qian, Xinyuan; Li, Haizhou. - : arXiv, 2022
BASE
Show details
97
Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features ...
BASE
Show details
98
MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data ...
BASE
Show details
99
DeepFry: Identifying Vocal Fry Using Deep Neural Networks ...
BASE
Show details
100
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis ...
Lei, Yi; Yang, Shan; Wang, Xinsheng. - : arXiv, 2022
BASE
Show details

Page: 1 2 3 4 5 6 7 8 9...46

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
906
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern