1 |
Multi-Dialect Speech Recognition With A Single Sequence-To-Sequence Model ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Detection of the Prodromal Phase of Bipolar Disorder from Psychological and Phonological Aspects in Social Media ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Additional file 1: of New insights on serodiagnosis of trichinellosis during window period: early diagnostic antigens from Trichinella spiralis intestinal worms ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Additional file 1: of New insights on serodiagnosis of trichinellosis during window period: early diagnostic antigens from Trichinella spiralis intestinal worms ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Video Captioning with Guidance of Multimodal Latent Topics ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Understanding the Changing Roles of Scientific Publications via Citation Embeddings ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Attacking Visual Language Grounding with Adversarial Examples: A Case Study on Neural Image Captioning ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Tree-Structured Neural Machine for Linguistics-Aware Sentence Generation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
A Scalable and Adaptive Method for Finding Semantically Equivalent Cue Words of Uncertainty ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
A Semantic QA-Based Approach for Text Summarization Evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Adversarial Multi-Criteria Learning for Chinese Word Segmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
A Sequential Matching Framework for Multi-turn Response Selection in Retrieval-based Chatbots ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Phonetic Temporal Neural Model for Language Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
End-to-End Attention based Text-Dependent Speaker Verification ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
No Need for a Lexicon? Evaluating the Value of the Pronunciation Lexica in End-to-End Models ...
|
|
Sainath, Tara N.; Prabhavalkar, Rohit; Kumar, Shankar; Lee, Seungji; Kannan, Anjuli; Rybach, David; Schogol, Vlad; Nguyen, Patrick; Li, Bo; Wu, Yonghui; Chen, Zhifeng; Chiu, Chung-Cheng. - : arXiv, 2017
|
|
Abstract:
For decades, context-dependent phonemes have been the dominant sub-word unit for conventional acoustic modeling systems. This status quo has begun to be challenged recently by end-to-end models which seek to combine acoustic, pronunciation, and language model components into a single neural network. Such systems, which typically predict graphemes or words, simplify the recognition process since they remove the need for a separate expert-curated pronunciation lexicon to map from phoneme-based units to words. However, there has been little previous work comparing phoneme-based versus grapheme-based sub-word units in the end-to-end modeling framework, to determine whether the gains from such approaches are primarily due to the new probabilistic model, or from the joint learning of the various components with grapheme-based units. In this work, we conduct detailed experiments which are aimed at quantifying the value of phoneme-based pronunciation lexica in the context of end-to-end models. We examine ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Machine Learning stat.ML; Sound cs.SD
|
|
URL: https://arxiv.org/abs/1712.01864 https://dx.doi.org/10.48550/arxiv.1712.01864
|
|
BASE
|
|
Hide details
|
|
|
|