Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher
- Year
- Medium
- Type
- BLLDB-Access:
  - free (5)
  - subject to license (0)

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 5 of 5

1	Lexical speaker identification in TV shows
	Roy, Anindya; Bredin, Hervé; Hartmann, William...
	In: ISSN: 1380-7501 ; EISSN: 1573-7721 ; Multimedia Tools and Applications ; https://hal.archives-ouvertes.fr/hal-01690342 ; Multimedia Tools and Applications, Springer Verlag, 2015, 74 (4), pp.1377 - 1396. ⟨10.1007/s11042-014-1940-3⟩ (2015)
	BASE
	Show details

2	TVD: a reproducible and multiply aligned TV series dataset
	Roy, Anindya; Guinaudeau, Camille; Bredin, Hervé...
	In: LREC 2014 ; https://hal.archives-ouvertes.fr/hal-01690279 ; LREC 2014, May 2014, Reykjavik, Iceland (2014)
	BASE
	Show details

3	"Sheldon speaking, bonjour!": Leveraging Multilingual Tracks for (Weakly) Supervised Speaker Identification
	Bredin, Hervé; Roy, Anindya; Pécheux, Nicolas; Allauzen, Alexandre
	In: ACM MM 2014, 22nd ACM International Conference on Multimedia ; https://hal.archives-ouvertes.fr/hal-01987812 ; ACM MM 2014, 22nd ACM International Conference on Multimedia, 2014, Orlando, United States (2014)
	Abstract: International audience ; We address the problem of speaker identification in multimedia data, and TV series in particular. While speaker identification is traditionally a supervised machine-learning task, our first contribution is to significantly reduce the need for costly preliminary manual annotations through the use of automatically aligned (and potentially noisy) fan-generated transcripts and subtitles. We show that both speech activity detection and speech turn identification modules trained in this weakly supervised manner achieve similar performance as their fully supervised counterparts (i.e. relying on fine manual speech/non-speech/speaker annotation). Our second contribution relates to the use of multilingual audio tracks usually available with this kind of content to significantly improve the overall speaker identification performance. Reproducible experiments (including dataset, manual annotations and source code) performed on the first six episodes of The Big Bang Theory TV series show that combining the French audio track (containing dubbed actor voices) with the English one (with the original actor voices) improves the overall English speaker identification performance by 5% absolute and up to 70% relative on the five main characters.
	Keyword: [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
	URL: https://hal.archives-ouvertes.fr/hal-01987812
	BASE
	Hide details

4	Some issues affecting the transcription of hungarian broadcast audio
	Roy, Anindya; Lamel, Lori; Fraga Da Silva, Thiago...
	In: Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-01843430 ; Annual Conference of the International Speech Communication Association , Aug 2013, Lyon, France (2013)
	BASE
	Show details

5	Acoustic unit discovery and pronunciation generation from a grapheme-based lexicon
	Hartmann, William; Roy, Anindya; Lamel, Lori...
	In: IEEE Automatic Speech Recognition and Understanding Workshop ; https://hal.archives-ouvertes.fr/hal-01843433 ; IEEE Automatic Speech Recognition and Understanding Workshop, Dec 2013, Olomouc, Czech Republic (2013)
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern