Page: 1 2 3 4 5 6 7 8 9... 50
81 |
Common Phone: A Multilingual Dataset for Robust Acoustic Modelling ...
|
|
|
|
BASE
|
|
Show details
|
|
82 |
Polyphone disambiguation and accent prediction using pre-trained language models in Japanese TTS front-end ...
|
|
|
|
BASE
|
|
Show details
|
|
83 |
Low-dimensional representation of infant and adult vocalization acoustics ...
|
|
|
|
BASE
|
|
Show details
|
|
84 |
Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character ...
|
|
|
|
BASE
|
|
Show details
|
|
85 |
Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives ...
|
|
|
|
BASE
|
|
Show details
|
|
86 |
Leveraging Phone Mask Training for Phonetic-Reduction-Robust E2E Uyghur Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
87 |
Similarity and Content-based Phonetic Self Attention for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
88 |
BERT-LID: Leveraging BERT to Improve Spoken Language Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
89 |
Chain-based Discriminative Autoencoders for Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
90 |
Building Robust Spoken Language Understanding by Cross Attention between Phoneme Sequence and ASR Hypothesis ...
|
|
|
|
Abstract:
Building Spoken Language Understanding (SLU) robust to Automatic Speech Recognition (ASR) errors is an essential issue for various voice-enabled virtual assistants. Considering that most ASR errors are caused by phonetic confusion between similar-sounding expressions, intuitively, leveraging the phoneme sequence of speech can complement ASR hypothesis and enhance the robustness of SLU. This paper proposes a novel model with Cross Attention for SLU (denoted as CASLU). The cross attention block is devised to catch the fine-grained interactions between phoneme and word embeddings in order to make the joint representations catch the phonetic and semantic features of input simultaneously and for overcoming the ASR errors in downstream natural language understanding (NLU) tasks. Extensive experiments are conducted on three datasets, showing the effectiveness and competitiveness of our approach. Additionally, We also validate the universality of CASLU and prove its complementarity when combining with other robust ... : ICASSP 2022 ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2203.12067 https://dx.doi.org/10.48550/arxiv.2203.12067
|
|
BASE
|
|
Hide details
|
|
91 |
STRATA: Word Boundaries & Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, & Data Augmentation ...
|
|
|
|
BASE
|
|
Show details
|
|
92 |
Three-Module Modeling For End-to-End Spoken Language Understanding Using Pre-trained DNN-HMM-Based Acoustic-Phonetic Model ...
|
|
|
|
BASE
|
|
Show details
|
|
95 |
Improving speaker de-identification with functional data analysis of f0 trajectories ...
|
|
|
|
BASE
|
|
Show details
|
|
96 |
Unsupervised word-level prosody tagging for controllable speech synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
97 |
Filter-based Discriminative Autoencoders for Children Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
98 |
Transducer-based language embedding for spoken language identification ...
|
|
|
|
BASE
|
|
Show details
|
|
99 |
Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0 ...
|
|
|
|
BASE
|
|
Show details
|
|
100 |
Multi-sequence Intermediate Conditioning for CTC-based ASR ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8 9... 50
|
|