DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5 6 7 8 9...46
Hits 81 – 100 of 906

81
Unsupervised word-level prosody tagging for controllable speech synthesis ...
Guo, Yiwei; Du, Chenpeng; Yu, Kai. - : arXiv, 2022
BASE
Show details
82
Filter-based Discriminative Autoencoders for Children Speech Recognition ...
BASE
Show details
83
Transducer-based language embedding for spoken language identification ...
Shen, Peng; Lu, Xugang; Kawai, Hisashi. - : arXiv, 2022
BASE
Show details
84
Multi-sequence Intermediate Conditioning for CTC-based ASR ...
Abstract: End-to-end automatic speech recognition (ASR) directly maps input speech to a character sequence without using pronunciation lexica. However, in languages with thousands of characters, such as Japanese and Mandarin, modeling all these characters is problematic due to data scarcity. To alleviate the problem, we propose a multi-task learning model with explicit interaction between characters and syllables by utilizing Self-conditioned connectionist temporal classification (CTC) technique. While the original Self-conditioned CTC estimates character-level intermediate predictions by applying auxiliary CTC losses to a set of intermediate layers, the proposed method additionally estimates syllable-level intermediate predictions in another set of intermediate layers. The character-level and syllable-level predictions are alternately used as conditioning features to deal with mutual dependency between characters and syllables. Experimental results on Japanese and Mandarin datasets show that the proposed ... : This paper was submitted to INTERSPEECH 2022 ...
Keyword: Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
URL: https://arxiv.org/abs/2204.00175
https://dx.doi.org/10.48550/arxiv.2204.00175
BASE
Hide details
85
Simple and Effective Unsupervised Speech Synthesis ...
BASE
Show details
86
Multistream neural architectures for cued-speech recognition using a pre-trained visual feature extractor and constrained CTC decoding ...
BASE
Show details
87
Applying Feature Underspecified Lexicon Phonological Features in Multilingual Text-to-Speech ...
Zhang, Cong; Zeng, Huinan; Liu, Huang. - : arXiv, 2022
BASE
Show details
88
Self-Supervised Representation Learning for Speech Using Visual Grounding and Masked Language Modeling ...
Peng, Puyuan; Harwath, David. - : arXiv, 2022
BASE
Show details
89
CALM: Contrastive Aligned Audio-Language Multirate and Multimodal Representations ...
BASE
Show details
90
Enhance Language Identification using Dual-mode Model with Knowledge Distillation ...
BASE
Show details
91
MAESTRO: Matched Speech Text Representations through Modality Matching ...
BASE
Show details
92
Cross-stitched Multi-modal Encoders ...
BASE
Show details
93
Effect and Analysis of Large-scale Language Model Rescoring on Competitive ASR Systems ...
BASE
Show details
94
ASR-Aware End-to-end Neural Diarization ...
BASE
Show details
95
Wavebender GAN: An architecture for phonetically meaningful speech manipulation ...
BASE
Show details
96
Speaker Extraction with Co-Speech Gestures Cue ...
Pan, Zexu; Qian, Xinyuan; Li, Haizhou. - : arXiv, 2022
BASE
Show details
97
Lombard Effect for Bilingual Speakers in Cantonese and English: importance of spectro-temporal features ...
BASE
Show details
98
MetricGAN+/-: Increasing Robustness of Noise Reduction on Unseen Data ...
BASE
Show details
99
DeepFry: Identifying Vocal Fry Using Deep Neural Networks ...
BASE
Show details
100
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis ...
Lei, Yi; Yang, Shan; Wang, Xinsheng. - : arXiv, 2022
BASE
Show details

Page: 1 2 3 4 5 6 7 8 9...46

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
906
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern