Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher
- Year:
  - 2022 (55)
  - 2021 (164)
  - 2020 (88)
  - 2019 (83)
  - 2018 (92)
  - 2017 (79)
  - 2016 (81)
  - 2015 (108)
  - 2014 (88)
  - 2013 (94)
  - more
- Medium
- Type
- BLLDB-Access:
  - free (5.709)
  - subject to license (197)

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7 8...286

Hits 61 – 80 of 5.709

61	LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech
	Evain, Solène; Nguyen, Ha; Le, Hang...
	In: INTERSPEECH 2021: ; INTERSPEECH 2021: Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-03317730 ; INTERSPEECH 2021: Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic (2021)
	BASE
	Show details

62	Re-synchronization using the Hand Preceding Model for Multi-modal Fusion in Automatic Continuous Cued Speech Recognition
	Liu, Li; Feng, Gang; Beautemps, Denis...
	In: ISSN: 1520-9210 ; IEEE Transactions on Multimedia ; https://hal.archives-ouvertes.fr/hal-02433830 ; IEEE Transactions on Multimedia, Institute of Electrical and Electronics Engineers, 2021, 23, pp.292-305. ⟨10.1109/TMM.2020.2976493⟩ (2021)
	BASE
	Show details

63	Brain-Inspired Audio-Visual Information Processing Using Spiking Neural Networks
	Wendt, Anne. - : Auckland University of Technology, 2021
	BASE
	Show details

64	Identifying Speaker State from Multimodal Cues
	Yang, Zixiaofan. - 2021
	Abstract: Automatic identification of speaker state is essential for spoken language understanding, with broad potential in various real-world applications. However, most existing work has focused on recognizing a limited set of emotional states using cues from a single modality. This thesis describes my research that addresses these limitations and challenges associated with speaker state identification by studying a wide range of speaker states, including emotion and sentiment, humor, and charisma, using features from speech, text, and visual modalities. The first part of this thesis focuses on emotion and sentiment recognition in speech. Emotion and sentiment recognition is one of the most studied topics in speaker state identification and has gained increasing attention in speech research recently, with extensive emotional speech models and datasets published every year. However, most work focuses only on recognizing a set of discrete emotions in high-resource languages such as English, while in real-life conversations, emotion is changing continuously and exists in all spoken languages. To address the mismatch, we propose a deep neural network model to recognize continuous emotion by combining inputs from raw waveform signals and spectrograms. Experimental results on two datasets show that the proposed model achieves state-of-the-art results by exploiting both waveforms and spectrograms as input. Due to the higher number of existing textual sentiment models than speech models in low-resource languages, we also propose a method to bootstrap sentiment labels from text transcripts and use these labels to train a sentiment classifier in speech. Utilizing the speaker state information shared across modalities, we extend speech sentiment recognition from high-resource languages to low-resource languages. Moreover, using the natural verse-level alignment in the audio Bibles across different languages, we also explore cross-lingual and cross-modality sentiment transfer. In the second part of the thesis, we focus on recognizing humor, whose expression is related to emotion and sentiment but has very different characteristics. Unlike emotion and sentiment that can be identified by crowdsourced annotators, humorous expressions are highly individualistic and cultural-specific, making it hard to obtain reliable labels. This results in the lack of data annotated for humor, and thus we propose two different methods to automatically and reliably label humor. First, we develop a framework for generating humor labels on videos, by learning from extensive user-generated comments. We collect and analyze 100 videos, building multimodal humor detection models using speech, text, and visual features, which achieves an F1-score of 0.76. In addition to humorous videos, we also develop another framework for generating humor labels on social media posts, by learning from user reactions to Facebook posts. We collect 785K posts with humor and non-humor scores and build models to detect humor with performance comparable to human labelers. The third part of the thesis focuses on charisma, a commonly found but less studied speaker state with unique challenges -- the definition of charisma varies a lot among perceivers, and the perception of charisma also varies with speakers' and perceivers' different demographic backgrounds. To better understand charisma, we conduct the first gender-balanced study of charismatic speech, including speakers and raters from diverse backgrounds. We collect personality and demographic information from the rater as well as their own speech, and examine individual differences in the perception and production of charismatic speech. We also extend the work to politicians' speech by collecting speaker trait ratings on representative speech segments of politicians and study how the genre, gender, and the rater's political stance influence the charisma ratings of the segments.
	Keyword: Automatic speech recognition--Research; Computer science; Emotions; Facebook (Firm); Humor; Speech perception--Mathematical models
	URL: https://doi.org/10.7916/d8-nbyk-rq75
	BASE
	Hide details

65	Unsupervised Morphological Segmentation and Part-of-Speech Tagging for Low-Resource Scenarios
	Eskander, Ramy. - 2021
	BASE
	Show details

66	Jira: a Kurdish Speech Recognition System Designing and Building Speech Corpus and Pronunciation Lexicon
	Veisi, Hadi; Hosseini, Hawre; Mohammadamini, Mohammad...
	In: https://hal.archives-ouvertes.fr/hal-03140680 ; 2021 (2021)
	BASE
	Show details

67	Recognizing lexical units in low-resource language contexts with supervised and unsupervised neural networks
	MACAIRE, Cécile
	In: https://hal.archives-ouvertes.fr/hal-03429051 ; [Research Report] LACITO (UMR 7107). 2021 (2021)
	BASE
	Show details

68	COSMO-Onset: A Neurally-Inspired Computational Model of Spoken Word Recognition, Combining Top-Down Prediction and Bottom-Up Detection of Syllabic Onsets
	Nabé, Mamady; Schwartz, Jean-Luc; Diard, Julien
	In: ISSN: 1662-5137 ; Frontiers in Systems Neuroscience ; https://hal.archives-ouvertes.fr/hal-03318691 ; Frontiers in Systems Neuroscience, Frontiers, 2021, 15, pp.653975. ⟨10.3389/fnsys.2021.653975⟩ (2021)
	BASE
	Show details

69	Speech Normalization and Data Augmentation Techniques Based on Acoustical and Physiological Constraints and Their Applications to Child Speech Recognition
	Yeung, Gary Joseph. - : eScholarship, University of California, 2021
	BASE
	Show details

70	Automatic Speech Recognition : from hybrid to end-to-end approach ; Reconnaissance automatique de la parole à large vocabulaire : des approches hybrides aux approches End-to-End
	Heba, Abdelwahab. - : HAL CCSD, 2021
	In: https://tel.archives-ouvertes.fr/tel-03616588 ; Intelligence artificielle [cs.AI]. Université Paul Sabatier - Toulouse III, 2021. Français. ⟨NNT : 2021TOU30116⟩ (2021)
	BASE
	Show details

71	Large vocabulary automatic speech recognition: from hybrid to end-to-end approaches ; Reconnaissance automatique de la parole à large vocabulaire : des approches hybrides aux approches End-to-End
	Heba, Abdelwahab. - : HAL CCSD, 2021
	In: https://hal.archives-ouvertes.fr/tel-03269807 ; Son [cs.SD]. Université toulouse 3 Paul Sabatier, 2021. Français (2021)
	BASE
	Show details

72	Privacy and utility of x-vector based speaker anonymization
	Srivastava, Brij Mohan Lal; Maouche, Mohamed; Sahidullah, Md...
	In: https://hal.inria.fr/hal-03197376 ; 2021 (2021)
	BASE
	Show details

73	Supplementary material to the paper The VoicePrivacy 2020 Challenge: Results and findings
	Tomashenko, Natalia; Wang, Xin; Vincent, Emmanuel...
	In: https://hal.archives-ouvertes.fr/hal-03335126 ; 2021 (2021)
	BASE
	Show details

74	Supplementary material to the paper The VoicePrivacy 2020 Challenge: Results and findings
	Tomashenko, Natalia; Wang, Xin; Vincent, Emmanuel...
	In: https://hal.archives-ouvertes.fr/hal-03335126 ; 2021 (2021)
	BASE
	Show details

75	The VoicePrivacy 2020 Challenge: Results and findings
	Tomashenko, Natalia; Wang, Xin; Vincent, Emmanuel...
	In: https://hal.archives-ouvertes.fr/hal-03332224 ; 2021 (2021)
	BASE
	Show details

76	Supplementary material to the paper The VoicePrivacy 2020 Challenge: Results and findings
	Tomashenko, Natalia; Wang, Xin; Vincent, Emmanuel...
	In: https://hal.archives-ouvertes.fr/hal-03335126 ; 2021 (2021)
	BASE
	Show details

77	The VoicePrivacy 2020 Challenge: Results and findings
	Tomashenko, Natalia; Wang, Xin; Vincent, Emmanuel...
	In: https://hal.archives-ouvertes.fr/hal-03332224 ; 2021 (2021)
	BASE
	Show details

78	Enhancing Speech Privacy with Slicing
	Maouche, Mohamed; Srivastava, Brij Mohan Lal; Vauquier, Nathalie...
	In: https://hal.inria.fr/hal-03369137 ; 2021 (2021)
	BASE
	Show details

79	Supplementary material to the paper The VoicePrivacy 2020 Challenge: Results and findings
	Tomashenko, Natalia; Wang, Xin; Vincent, Emmanuel...
	In: https://hal.archives-ouvertes.fr/hal-03335126 ; 2021 (2021)
	BASE
	Show details

80	Training RNN Language Models on Uncertain ASR Hypotheses in Limited Data Scenarios
	Sheikh, Imran,; Vincent, Emmanuel; Illina, Irina
	In: https://hal.inria.fr/hal-03327306 ; 2021 (2021)
	BASE
	Show details

Page: 1 2 3 4 5 6 7 8...286

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern