Page: 1 2 3 4 5 6 7 8... 91
61 |
CTA-RNN: Channel and Temporal-wise Attention RNN Leveraging Pre-trained ASR Embeddings for Speech Emotion Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
62 |
Automatic Depression Detection: An Emotional Audio-Textual Corpus and a GRU/BiLSTM-based Model ...
|
|
|
|
BASE
|
|
Show details
|
|
63 |
Fine-grained Noise Control for Multispeaker Speech Synthesis ...
|
|
|
|
BASE
|
|
Show details
|
|
64 |
Emotion Intensity and its Control for Emotional Voice Conversion ...
|
|
|
|
BASE
|
|
Show details
|
|
65 |
Automatic Speech recognition for Speech Assessment of Preschool Children ...
|
|
|
|
BASE
|
|
Show details
|
|
66 |
The HCCL-DKU system for fake audio generation task of the 2022 ICASSP ADD Challenge ...
|
|
|
|
Abstract:
The voice conversion task is to modify the speaker identity of continuous speech while preserving the linguistic content. Generally, the naturalness and similarity are two main metrics for evaluating the conversion quality, which has been improved significantly in recent years. This paper presents the HCCL-DKU entry for the fake audio generation task of the 2022 ICASSP ADD challenge. We propose a novel ppg-based voice conversion model that adopts a fully end-to-end structure. Experimental results show that the proposed method outperforms other conversion models, including Tacotron-based and Fastspeech-based models, on conversion quality and spoofing performance against anti-spoofing systems. In addition, we investigate several post-processing methods for better spoofing power. Finally, we achieve second place with a deception success rate of 0.916 in the ADD challenge. ...
|
|
Keyword:
Audio and Speech Processing eess.AS; FOS Computer and information sciences; FOS Electrical engineering, electronic engineering, information engineering; Sound cs.SD
|
|
URL: https://dx.doi.org/10.48550/arxiv.2201.12567 https://arxiv.org/abs/2201.12567
|
|
BASE
|
|
Hide details
|
|
67 |
Dawn of the transformer era in speech emotion recognition: closing the valence gap ...
|
|
|
|
BASE
|
|
Show details
|
|
68 |
Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents ...
|
|
|
|
BASE
|
|
Show details
|
|
69 |
KazakhTTS2: Extending the Open-Source Kazakh TTS Corpus With More Data, Speakers, and Topics ...
|
|
|
|
BASE
|
|
Show details
|
|
70 |
Automated speech tools for helping communities process restricted-access corpora for language revival efforts ...
|
|
|
|
BASE
|
|
Show details
|
|
71 |
Classifying Autism from Crowdsourced Semi-Structured Speech Recordings: A Machine Learning Approach ...
|
|
|
|
BASE
|
|
Show details
|
|
73 |
Language-Independent Speaker Anonymization Approach using Self-Supervised Pre-Trained Models ...
|
|
|
|
BASE
|
|
Show details
|
|
74 |
Separate What You Describe: Language-Queried Audio Source Separation ...
|
|
|
|
BASE
|
|
Show details
|
|
75 |
A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
76 |
DRSpeech: Degradation-Robust Text-to-Speech Synthesis with Frame-Level and Utterance-Level Acoustic Representation Learning ...
|
|
|
|
BASE
|
|
Show details
|
|
79 |
Impact of Naturalistic Field Acoustic Environments on Forensic Text-independent Speaker Verification System ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8... 91
|
|