1 |
Influence of Highly Inflected Word Forms and Acoustic Background on the Robustness of Automatic Speech Recognition for Human–Computer Interaction
|
|
|
|
In: Mathematics; Volume 10; Issue 5; Pages: 711 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Discriminative feature modeling for statistical speech recognition ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Cross-lingual acoustic modeling in upper sorbian - preliminary study
|
|
|
|
In: Fraunhofer IKTS (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Glottal Stops in Upper Sorbian: A Data-Driven Approach
|
|
|
|
In: Fraunhofer IKTS (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Estimating the Degree of Sleepiness by Integrating Articulatory Feature Knowledge in Raw Waveform Based CNNS ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Estimating the Degree of Sleepiness by Integrating Articulatory Feature Knowledge in Raw Waveform Based CNNS ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Dealing with linguistic mismatches for automatic speech recognition
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Speech recognition with probabilistic transcriptions and end-to-end systems using deep learning
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Phonetic Context Embeddings for DNN-HMM Phone Recognition
|
|
|
|
In: Interspeech 2016 ; https://hal.sorbonne-universite.fr/hal-02166078 ; Interspeech 2016, Sep 2016, SAN FRANCISCO, United States. pp.405-409, ⟨10.21437/Interspeech.2016-1036⟩ (2016)
|
|
BASE
|
|
Show details
|
|
11 |
Modeling of a rise-fall intonation pattern in the language of young Paris Speakers
|
|
|
|
In: Speech Prosody ; https://halshs.archives-ouvertes.fr/halshs-01069584 ; Speech Prosody, 2014, 7, pp.814-818 (2014)
|
|
BASE
|
|
Show details
|
|
12 |
Vers une modélisation acoustique de l'intonation des jeunes en région parisienne : une question de " proximité " ?
|
|
|
|
In: ISSN: 1661-8246 ; EISSN: 1661-8246 ; Nouveaux Cahiers de Linguistique Française ; https://halshs.archives-ouvertes.fr/halshs-01069593 ; Nouveaux Cahiers de Linguistique Française, Université de Genève, 2014, 31, pp.257-171 (2014)
|
|
BASE
|
|
Show details
|
|
13 |
Towards the automatic processing of Yongning Na (Sino-Tibetan): developing a 'light' acoustic model of the target language and testing 'heavyweight' models from five national languages
|
|
|
|
In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014) ; 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014) ; https://halshs.archives-ouvertes.fr/halshs-00980431 ; 4th International Workshop on Spoken Language Technologies for Under-resourced Languages (SLTU 2014), May 2014, St Petersburg, Russia. pp.153-160 (2014)
|
|
BASE
|
|
Show details
|
|
14 |
Modélisation acoustico-phonétique de langues peu dotées : Études phonétiques et travaux de reconnaissance automatique en luxembourgois
|
|
|
|
In: Journées d'Etude sur la Parole ; https://hal.archives-ouvertes.fr/hal-01843399 ; Journées d'Etude sur la Parole, Jan 2014, Le Mans, France (2014)
|
|
BASE
|
|
Show details
|
|
15 |
Speech Alignment and Recognition Experiments for Luxembourgish
|
|
|
|
In: Proceedings of the 4th International Workshop on Spoken Language Technologies for Underresourced Languages ; 4th International Workshop on Spoken Language Technologies for Underresourced Languages ; https://hal.archives-ouvertes.fr/hal-01134824 ; 4th International Workshop on Spoken Language Technologies for Underresourced Languages, May 2014, Saint-Petersbourg, Russia. pp.53-60 ; http://www.mica.edu.vn/sltu2014/ (2014)
|
|
BASE
|
|
Show details
|
|
16 |
A First LVCSR System for Luxembourgish, a Low-Resourced European Language
|
|
|
|
In: Human Language Technology Challenges for Computer Science and Linguistics ; https://hal.archives-ouvertes.fr/hal-01135103 ; Zygmunt Vetulani; Joseph Mariani. Human Language Technology Challenges for Computer Science and Linguistics, 8387, Springer International Publishing, pp.479-490, 2014, 5th Language and Technology Conference, LTC 2011, Poznań, Poland, November 25--27, 2011, Revised Selected Papers, 978-3-319-08957-7. ⟨10.1007/978-3-319-08958-4_39⟩ (2014)
|
|
BASE
|
|
Show details
|
|
17 |
Impact of Video Modeling Techniques on Efficiency and Effectiveness of Clinical Voice Assessment
|
|
|
|
In: http://rave.ohiolink.edu/etdc/view?acc_num=miami1398686540 (2014)
|
|
BASE
|
|
Show details
|
|
18 |
Anger Recognition in Speech Using Acoustic and Linguistic Cues
|
|
: Elsevier, 2013
|
|
Abstract:
Abstract The present study elaborates on the exploitation of both linguistic and acoustic feature modeling for anger classification. In terms of acoustic modeling we generate statistics from acoustic audio descriptors, e.g. pitch, loudness, spectral characteristics. Ranking our features we see that loudness and MFCC seems most promising for all databases. For the English database also pitch features are important. In terms of linguistic modeling we apply probabilistic and entropy-based models of words and phrases, e.g. Bag-of-Words (BOW), Term Frequency (TF), Term Frequency - Inverse Document Frequency (TF.IDF) and the Self-Referential Information (SRI). SRI clearly outperforms vector space models. Modeling phrases slightly improves the scores. After classification of both acoustic and linguistic information on separated levels we fuse information on decision level adding confidences. We compare the obtained scores on three different databases. Two databases are taken from the IVR customer care domain, another database accounts for a WoZ data collection. All corpora are of realistic speech condition. We observe promising results for the IVR databases while the WoZ database shows overall lower scores. In order to provide comparability in between the results we evaluate classification success using the f1 measurement in addition to overall accuracy figures. As a result, acoustic modeling clearly outperforms linguistic modeling. Fusion slightly improves overall scores. With a baseline of approximately 60% accuracy and .40 f1-meaurement by constant majority class voting we obtain an accuracy of 75% with respective .70 f1 for the WoZ database. For the IVR databases we obtain approximately 79% accuracy with respective .78 f1 over a baseline of 60% accurracy with respective .38 f1. ; correspondence: Corresponding author. (Polzehl, Tim) ; tim.polzehl@gmail.com (Polzehl, Tim) ; Quality and Usability Lab--> , Technischen Universitat Berlin / Deutsche Telekom Laboratories--> , Ernst-Reuter-Platz 7--> , D-10587 Berlin--> - GERMANY (Polzehl, Tim) ; GERMANY (Polzehl, Tim) ; Dialogue Systems Group / Institute of Information Technology, University of Ulm - Albert-Einstein-Allee 43--> , D-89081 Ulm--> - GERMANY (Schmitt, Alexander) ; Language Technologies Institute, Carnegie Mellon University - 5000 Forbes Avenue--> , Pittsburgh--> , PA 15213--> , U.S.A.--> - (Metze, Florian) ; National Centre for Biometric Studies, University of Canberra - ACT 2601--> - AUSTRALIA (Wagner, Michael) ; AUSTRALIA ; GERMANY ; Received: 2010-05-01 ; Revised: 2011-02-10 ; Accepted: 2011-05-04
|
|
Keyword:
anger classification; decision fusion; emotion detection; IGR ranking; IVR speech; linguistic and prosodic acoustic modeling
|
|
URL: http://hdl.handle.net/2262/65936 https://doi.org/10.1016/j.specom.2011.05.002
|
|
BASE
|
|
Hide details
|
|
19 |
Detection of acoustic-phonetic landmarks in mismatched conditions using a biomimetic model of human auditory processing
|
|
|
|
In: http://www.isle.uiuc.edu/%7Esborys/king_coling12.pdf (2012)
|
|
BASE
|
|
Show details
|
|
20 |
Detection of acoustic-phonetic landmarks in mismatched conditions using a biomimetic model of human auditory processing
|
|
|
|
In: http://aclweb.org/anthology/C/C12/C12-2058.pdf (2012)
|
|
BASE
|
|
Show details
|
|
|
|