1 |
Articulatory representations to address acoustic variability in speech
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Improved vocal tract reconstruction and modeling using an image super-resolution technique
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Retrieving Tract Variables From Acoustics: A Comparison of Different Machine Learning Strategies
|
|
|
|
BASE
|
|
Show details
|
|
6 |
An MRI-based articulatory and acoustic study of American English liquid sounds /r/ and /l/
|
|
|
|
BASE
|
|
Show details
|
|
7 |
A magnetic resonance imaging-based articulatory and acoustic study of “retroflex” and “bunched” American English ∕r∕
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Synergy of Acoustic-Phonetics and Auditory Modeling Towards Robust Speech Recognition
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Robust Voice Mining Techniques for Telephone Conversations
|
|
|
|
Abstract:
Voice mining involves speaker detection in a set of multi-speaker files. In published work, training data is used for constructing target speaker models. In this study, a new voice mining scenario was considered, where there is no demarcation between training and testing data and prior target speaker models are absent. Given a database of telephone conversations, the task is to identify conversations having one or more speakers in common. Various approaches including semi-automatic and fully automatic techniques were explored and different scoring strategies were considered. Given the poor audio quality, automatic speaker segmentation is not very effective. A new technique was developed which does not require speaker segmentation by training a multi-speaker model on the entire conversation. This technique is more robust and it outperforms the automatic speaker segmentation approach. On the ENRON database, the EER is 15.98% and 6.25% for at least one and two speakers in common, respectively.
|
|
Keyword:
Electronics and Electrical; Engineering; speaker detection; speaker recognition; voice mining
|
|
URL: http://hdl.handle.net/1903/3827
|
|
BASE
|
|
Hide details
|
|
17 |
Intraspeaker Comparisons of Acoustic and Articulatory Variability in American English /r/ Productions
|
|
|
|
BASE
|
|
Show details
|
|
|
|