1 |
Dual-Decoder Transformer For end-to-end Mandarin Chinese Speech Recognition with Pinyin and Character ...
|
|
|
|
Abstract:
End-to-end automatic speech recognition (ASR) has achieved promising results. However, most existing end-to-end ASR methods neglect the use of specific language characteristics. For Mandarin Chinese ASR tasks, pinyin and character as writing and spelling systems respectively are mutual promotion in the Mandarin Chinese language. Based on the above intuition, we investigate types of related models that are suitable but not for joint pinyin-character ASR and propose a novel Mandarin Chinese ASR model with dual-decoder Transformer according to the characteristics of the pinyin transcripts and character transcripts. Specifically, the joint pinyin-character layer-wise linear interactive (LWLI) module and phonetic posteriorgrams adapter (PPGA) are proposed to achieve inter-layer multi-level interaction by adaptively fusing pinyin and character information. Furthermore, a two-stage training strategy is proposed to make training more stable and faster convergence. The results on the test sets of AISHELL-1 dataset ...
|
|
Keyword:
Audio and Speech Processing eess.AS; Computation and Language cs.CL; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://arxiv.org/abs/2201.10792 https://dx.doi.org/10.48550/arxiv.2201.10792
|
|
BASE
|
|
Hide details
|
|
2 |
Fine-Scale Population Admixture Landscape of Tai–Kadai-Speaking Maonan in Southwest China Inferred From Genome-Wide SNP Data
|
|
|
|
In: Front Genet (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Observation of new excited ${B} ^0_{s} $ states
|
|
|
|
In: Eur.Phys.J.C ; https://hal.archives-ouvertes.fr/hal-03010999 ; Eur.Phys.J.C, 2021, 81 (7), pp.601. ⟨10.1140/epjc/s10052-021-09305-3⟩ (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Cross Modification Attention Based Deliberation Model for Image Captioning ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Syntax-aware Data Augmentation for Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Knowledge Distillation for Multilingual Unsupervised Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
SG-Net: Syntax Guided Transformer for Language Representation ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Neural topic modeling with bidirectional adversarial training
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Measurement of $W^{\pm}$-boson and $Z$-boson production cross-sections in $pp$ collisions at $\sqrt{s}=2.76$ TeV with the ATLAS detector
|
|
|
|
BASE
|
|
Show details
|
|
14 |
BiSET: Bi-directional Selective Encoding with Template for Abstractive Summarization ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Towards a Discipline of Multimodality: Parallels to Mathematics and Linguistics and New Ways Forward
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Open event extraction from online text using a generative adversarial network
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Exploring Recombination for Efficient Decoding of Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Editorial for the special issue on heterogeneous sensors-based object identification and information fusion
|
|
|
|
In: ISSN: 1074-5351 ; EISSN: 1099-1131 ; International Journal of Communication Systems ; https://hal.laas.fr/hal-02091767 ; International Journal of Communication Systems, Wiley, 2017, 30 (5), pp.e3298. ⟨10.1002/dac.3298⟩ (2017)
|
|
BASE
|
|
Show details
|
|
20 |
Structural Stability of Lexical Semantic Spaces: Nouns in Chinese and French ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|