1 |
Semi-supervised cycle-consistency training for end-to-end ASR using unpaired speech
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Enforcing constraints for multi-lingual and cross-lingual speech-to-text systems
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Knowledge base integration in biomedical natural language processing applications
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Learning speech embeddings for speaker adaptation and speech understanding
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Modeling phones, keywords, topics and intents in spoken languages
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Speech technology for unwritten languages
|
|
|
|
In: ISSN: 2329-9290 ; EISSN: 2329-9304 ; IEEE/ACM Transactions on Audio, Speech and Language Processing ; https://hal.inria.fr/hal-02480675 ; IEEE/ACM Transactions on Audio, Speech and Language Processing, Institute of Electrical and Electronics Engineers, 2020, ⟨10.1109/TASLP.2020.2973896⟩ (2020)
|
|
BASE
|
|
Show details
|
|
8 |
How Phonotactics Affect Multilingual and Zero-shot ASR Performance ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
That Sounds Familiar: an Analysis of Phonetic Representations Transfer Across Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Autosegmental Neural Nets: Should Phones and Tones be Synchronous or Asynchronous? ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Identify Speakers in Cocktail Parties with End-to-End Attention ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Acoustic event, spoken keyword and emotional outburst detection
|
|
|
|
Abstract:
This thesis presents work in research topics of audio detection. It first describes a system for large-scale multi-label acoustic event detection (AED) in YouTube videos. It explores the potential of the state-of-the-art deep learning classifiers for AED, describes both qualitative and quantitative results (Hit@1 is 47.9%) and presents the pre-trained embedding model as a powerful feature extractor to be adapted to new domains with limited data and improve the detection accuracy (Hit@1 is 58.1%). Second, the thesis focuses on the speech acoustic events and the spoken keyword spotting task for speech. It presents a phonetic keyword spotter as a lightweight alternative to full speech recognition (3x faster, with comparable detection rates and that addresses automatic speech recognition problems). It also explores cross-lingual keyword spotting to support low resource languages and finds that the acoustic model is dominant in determining the cross-lingual keyword search performance. Third, the thesis further presents the emotional outburst detection for infant nonspeech acoustic events. It reports on the efforts to manually code child utterances as being of type “laugh,” “cry,” “fuss,” “babble,” and “hiccup” and to develop the algorithms capable of performing the same task automatically.
|
|
Keyword:
audio event detection; convolutional neural network; emotion detection; hidden Markov model; phonetic keywork spotter; speech recognition; spoken keyword detection
|
|
URL: http://hdl.handle.net/2142/105158
|
|
BASE
|
|
Hide details
|
|
13 |
Automatic speech recognition for low-resource languages and dialects
|
|
|
|
BASE
|
|
Show details
|
|
15 |
The benefits of acoustic perceptual information for speech processing systems
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Dealing with linguistic mismatches for automatic speech recognition
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Bayesian models for unit discovery on a very low resource language
|
|
|
|
In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) ; https://hal.archives-ouvertes.fr/hal-01709589 ; IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Apr 2018, Calgary, Alberta, Canada (2018)
|
|
BASE
|
|
Show details
|
|
19 |
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the “Speaking rosetta” JSALT 2017 workshop
|
|
|
|
In: ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01709578 ; ICASSP 2018 - IEEE International Conference on Acoustics, Speech and Signal Processing, Apr 2018, Calgary, Alberta, Canada (2018)
|
|
BASE
|
|
Show details
|
|
20 |
Linguistic unit discovery from multi-modal inputs in unwritten languages: Summary of the "Speaking Rosetta" JSALT 2017 Workshop ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|