DE eng

Search in the Catalogues and Directories

Hits 1 – 5 of 5

1
Lexical speaker identification in TV shows
In: ISSN: 1380-7501 ; EISSN: 1573-7721 ; Multimedia Tools and Applications ; https://hal.archives-ouvertes.fr/hal-01690342 ; Multimedia Tools and Applications, Springer Verlag, 2015, 74 (4), pp.1377 - 1396. ⟨10.1007/s11042-014-1940-3⟩ (2015)
BASE
Show details
2
TVD: a reproducible and multiply aligned TV series dataset
In: LREC 2014 ; https://hal.archives-ouvertes.fr/hal-01690279 ; LREC 2014, May 2014, Reykjavik, Iceland (2014)
BASE
Show details
3
"Sheldon speaking, bonjour!": Leveraging Multilingual Tracks for (Weakly) Supervised Speaker Identification
In: ACM MM 2014, 22nd ACM International Conference on Multimedia ; https://hal.archives-ouvertes.fr/hal-01987812 ; ACM MM 2014, 22nd ACM International Conference on Multimedia, 2014, Orlando, United States (2014)
Abstract: International audience ; We address the problem of speaker identification in multimedia data, and TV series in particular. While speaker identification is traditionally a supervised machine-learning task, our first contribution is to significantly reduce the need for costly preliminary manual annotations through the use of automatically aligned (and potentially noisy) fan-generated transcripts and subtitles. We show that both speech activity detection and speech turn identification modules trained in this weakly supervised manner achieve similar performance as their fully supervised counterparts (i.e. relying on fine manual speech/non-speech/speaker annotation). Our second contribution relates to the use of multilingual audio tracks usually available with this kind of content to significantly improve the overall speaker identification performance. Reproducible experiments (including dataset, manual annotations and source code) performed on the first six episodes of The Big Bang Theory TV series show that combining the French audio track (containing dubbed actor voices) with the English one (with the original actor voices) improves the overall English speaker identification performance by 5% absolute and up to 70% relative on the five main characters.
Keyword: [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
URL: https://hal.archives-ouvertes.fr/hal-01987812
BASE
Hide details
4
Some issues affecting the transcription of hungarian broadcast audio
In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-01843430 ; Annual Conference of the International Speech Communication Association , Aug 2013, Lyon, France (2013)
BASE
Show details
5
Acoustic unit discovery and pronunciation generation from a grapheme-based lexicon
In: IEEE Automatic Speech Recognition and Understanding Workshop ; https://hal.archives-ouvertes.fr/hal-01843433 ; IEEE Automatic Speech Recognition and Understanding Workshop, Dec 2013, Olomouc, Czech Republic (2013)
BASE
Show details

Catalogues
Bibliographies
Linked Open Data catalogues
Online resources
Open access documents
5
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern