Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 6 of 6

1	Many-to-Many Voice Conversion based Feature Disentanglement using Variational Autoencoder ...
	Luong, Manh; Tran, Viet Anh. - : arXiv, 2021
	BASE
	Show details

2	Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization
	Tran, Viet-Anh; Le, Viet,; Barras, Claude...
	In: Interspeech 2011 ; https://hal.archives-ouvertes.fr/hal-01690265 ; Interspeech 2011, Aug 2011, Florence, Italy (2011)
	BASE
	Show details

3	Analysis and recognition of NAM speech using HMM distances and visual information
	Heracleous, Panikos; Tran, Viet-Anh; Nagai, Takayuki...
	In: Institute of Electrical and Electronics Engineers. IEEE transactions on audio, speech and language processing. - New York, NY : Inst. 18 (2010) 6, 1528-1538
	BLLDB
	Show details

4	Improvement to a NAM-captured whisper-to-speech system
	Bailly, Gérard; Tran, Viet-Anh; Toda, Tomoki...
	In: Speech communication. - Amsterdam [u.a.] : Elsevier 52 (2010) 4, 314-326
	BLLDB
	OLC Linguistik
	Show details

5	Improvement to a NAM-captured whisper-to-speech system
	Tran, Viet-Anh; Bailly, Gérard; Loevenbruck, Hélène; Toda, Tomoki
	In: ISSN: 0167-6393 ; EISSN: 1872-7182 ; Speech Communication ; https://hal.archives-ouvertes.fr/hal-00616229 ; Speech Communication, Elsevier : North-Holland, 2010, 52 (4), pp.314. ⟨10.1016/j.specom.2009.11.005⟩ (2010)
	Abstract: International audience ; Exploiting a tissue-conductive sensor - a stethoscopic microphone - the system developed at NAIST which converts Non-Audible Murmur (NAM) to audible speech by GMM-based statistical mapping is a very promising technique. The quality of the converted speech is however still insufficient for computer-mediated communication, notably because of the poor estimation of from unvoiced speech and because of impoverished phonetic contrasts. This paper presents our investigations to improve the intelligibility and naturalness of the synthesized speech and first objective and subjective evaluations of the resulting system. The first improvement concerns voicing and estimation. Instead of using a single GMM for both, we estimate a continuous using a GMM, trained on target voiced segments only. The continuous estimation is filtered by a voicing decision computed by a neural network. The objective and subjective improvement is significant. The second improvement concerns the input time window and its dimensionality reduction: we show that the precision of estimation is also significantly improved by extending the input time window from 90 to 450ms and by using a Linear Discriminant Analysis (LDA) instead of the original Principal Component Analysis (PCA). Estimation of spectral envelope is also slightly improved with LDA but is degraded with larger time windows. A third improvement consists in adding visual parameters both as input and output parameters. The positive contribution of this information is confirmed by a subjective test. Finally, HMM-based conversion is compared with GMM-based conversion.
	Keyword: non-audible murmur
	URL: https://doi.org/10.1016/j.specom.2009.11.005 https://hal.archives-ouvertes.fr/hal-00616229/document https://hal.archives-ouvertes.fr/hal-00616229/file/PEER_stage2_10.1016%252Fj.specom.2009.11.005.pdf https://hal.archives-ouvertes.fr/hal-00616229
	BASE
	Hide details

6	Silent Communication: whispered speech-to-clear speech conversion ; Communication silencieuse: conversion de la parole chuchotée en parole claire
	Tran, Viet-Anh. - : HAL CCSD, 2010
	In: https://tel.archives-ouvertes.fr/tel-00614289 ; Computer Science [cs]. Institut National Polytechnique de Grenoble - INPG, 2010. English (2010)
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern