DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 31

1
Vocapia-LIMSI System for 2020 Shared Task on Code-switched Spoken Language Identification
In: The First Workshop on Speech Technologies for Code-Switching in Multilingual Communities ; https://hal.archives-ouvertes.fr/hal-03091792 ; The First Workshop on Speech Technologies for Code-Switching in Multilingual Communities, Oct 2020, Shanghai, China (2020)
BASE
Show details
2
Low-latency speaker spotting with online diarization and detection
In: The Speaker and Language Recognition Workshop ; https://hal.archives-ouvertes.fr/hal-01836490 ; The Speaker and Language Recognition Workshop, ISCA, Jun 2018, Les Sables d'Olonne, France (2018)
BASE
Show details
3
Combining Speaker Turn Embedding and Incremental Structure Prediction for Low-Latency Speaker Diarization
In: Interspeech 2017, 18th Annual Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-01690162 ; Interspeech 2017, 18th Annual Conference of the International Speech Communication Association, Aug 2017, Stockholm, Sweden. ⟨10.21437/Interspeech.2017-1067⟩ (2017)
BASE
Show details
4
Multimodal Person Discovery in Broadcast TV: lessons learned from MediaEval 2015
In: ISSN: 1380-7501 ; EISSN: 1573-7721 ; Multimedia Tools and Applications ; https://hal.archives-ouvertes.fr/hal-01690581 ; Multimedia Tools and Applications, Springer Verlag, 2017, 76 (21), pp.22547 - 22567. ⟨10.1007/s11042-017-4730-x⟩ (2017)
BASE
Show details
5
Benchmarking Multimedia Technologies with the CAMOMILE Platform: the Case of Multimodal Person Discovery at MediaEval 2015
In: LREC 2016 ; https://hal.archives-ouvertes.fr/hal-01690277 ; LREC 2016, May 2016, Portorož, Slovenia (2016)
BASE
Show details
6
The CAMOMILE Collaborative Annotation Platform for Multi-modal, Multi-lingual and Multi-media Documents
In: Proceedings of LREC 2016 ; LREC 2016 Conference ; https://hal.archives-ouvertes.fr/hal-01350096 ; LREC 2016 Conference, May 2016, Portoroz, Slovenia (2016)
BASE
Show details
7
Lexical speaker identification in TV shows
In: ISSN: 1380-7501 ; EISSN: 1573-7721 ; Multimedia Tools and Applications ; https://hal.archives-ouvertes.fr/hal-01690342 ; Multimedia Tools and Applications, Springer Verlag, 2015, 74 (4), pp.1377 - 1396. ⟨10.1007/s11042-014-1940-3⟩ (2015)
Abstract: The final publication is available at https://link.springer.com/article/10.1007/s11042-014-1940-3 ; International audience ; It is possible to use lexical information extracted from speech transcripts for speaker identification (SID), either on its own or to improve the performance of standard cepstral-based SID systems upon fusion. This was established before typically using isolated speech from single speakers (NIST SRE corpora, parliamentary speeches). On the contrary, this work applies lexical approaches for SID on a different type of data. It uses the REPERE corpus consisting of unsegmented multiparty conversations, mostly debates, discussions and Q&A sessions from TV shows. It is hypothesized that people give out clues to their identity when speaking in such settings which this work aims to exploit. The impact on SID performance of the diarization front-end required to pre-process the unsegmented data is also measured. Four lexical SID approaches are studied in this work, including TFIDF, BM25 and LDA-based topic modeling. Results are analysed in terms of TV shows and speaker roles. Lexical approaches achieve low error rates for certain speaker roles such as anchors and journalists, sometimes lower than a standard cepstral-based Gaussian Supervector-Support Vector Machine (GSV-SVM) system. Also, in certain cases, the lexical system shows modest improvement over the cepstral-based system performance using score-level sum fusion. To highlight the potential of using lexical information not just to improve upon cepstral-based SID systems but as an independent approach in its own right, initial studies on crossmedia SID is briefly reported. Instead of using 2 Anindya Roy et al. speech data as all cepstral systems require, this approach uses Wikipedia texts to train lexical speaker models which are then tested on speech transcripts to identify speakers.
Keyword: [INFO.INFO-MM]Computer Science [cs]/Multimedia [cs.MM]; [INFO]Computer Science [cs]
URL: https://hal.archives-ouvertes.fr/hal-01690342/file/paper_v0.pdf
https://hal.archives-ouvertes.fr/hal-01690342/document
https://doi.org/10.1007/s11042-014-1940-3
https://hal.archives-ouvertes.fr/hal-01690342
BASE
Hide details
8
Analysing rhythm in ritual discourse in Yucatec Maya using automatic speech alignment
In: Interspeech 2015 Speech beyond speech ; https://halshs.archives-ouvertes.fr/halshs-01250490 ; Interspeech 2015 Speech beyond speech, Sep 2015, Dresden, Germany ; http://interspeech2015.org/ (2015)
BASE
Show details
9
Collaborative Annotation for Person Identification in TV Shows
In: Interspeech 2015 (short demo paper) ; https://hal.archives-ouvertes.fr/hal-01170513 ; Interspeech 2015 (short demo paper), Sep 2015, Dresden, Germany (2015)
BASE
Show details
10
TVD: a reproducible and multiply aligned TV series dataset
In: LREC 2014 ; https://hal.archives-ouvertes.fr/hal-01690279 ; LREC 2014, May 2014, Reykjavik, Iceland (2014)
BASE
Show details
11
Study of vowels and Voice Strength by Discriminant Analysis ; Etude des voyelles et de la force de voix par analyse discriminante
In: ISCA JEP2014 ; 30emes Journees d'Etude sur la Parole ; https://hal.archives-ouvertes.fr/hal-01885618 ; 30emes Journees d'Etude sur la Parole, ISCA AFCP, Jun 2014, Le Mans, France (2014)
BASE
Show details
12
Combination of Cepstral and Phonetically Discriminative Features for Speaker Verification
In: ISSN: 1070-9908 ; IEEE Signal Processing Letters ; https://hal.archives-ouvertes.fr/hal-01690336 ; IEEE Signal Processing Letters, Institute of Electrical and Electronics Engineers, 2014, 21 (9), pp.1040 - 1044. ⟨10.1109/LSP.2014.2323432⟩ (2014)
BASE
Show details
13
Impact of overlapping speech detection on speaker diarization for broadcast news and debates
In: IEEE International Conference on Acoustics, Speech, and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01836475 ; IEEE International Conference on Acoustics, Speech, and Signal Processing, Jan 2013, Vancouver, Canada (2013)
BASE
Show details
14
Towards a better integration of written names for unsupervised speakers identification in videos
In: First Workshop on Speech, Language and Audio in Multimedia, SLAM ; https://hal.inria.fr/hal-00953089 ; First Workshop on Speech, Language and Audio in Multimedia, SLAM, 2013, Marseille, France (2013)
BASE
Show details
15
Une étude quantitative des marqueurs discursifs, disfluences et chevauchements de parole dans des interviews politiques
In: ISSN: 2118-870X ; EISSN: 2264-7082 ; Travaux Interdisciplinaires du Laboratoire Parole et Langage d'Aix-en-Provence (TIPA) ; https://hal.archives-ouvertes.fr/hal-01135042 ; Travaux Interdisciplinaires du Laboratoire Parole et Langage d'Aix-en-Provence (TIPA), Laboratoire Parole et Langage, 2013, pp.18. ⟨10.4000/tipa.830⟩ (2013)
BASE
Show details
16
Lattice MLLR based m-vector system for speaker verification
In: IEEE International Conference on Acoustics, Speech, and Signal Processing ; https://hal.archives-ouvertes.fr/hal-01836461 ; IEEE International Conference on Acoustics, Speech, and Signal Processing, Jan 2013, Vancouver, Canada (2013)
BASE
Show details
17
Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast
In: Proceedings of the 13th Annual Conference of the International Speech Communication Association (Interspeech) ; Interspeech 2012 - Conference of the International Speech Communication Association ; https://hal.archives-ouvertes.fr/hal-00767427 ; Interspeech 2012 - Conference of the International Speech Communication Association, Sep 2012, Portland, OR, United States. 4p (2012)
BASE
Show details
18
Comparing Multi-Stage Approaches for Cross-Show Speaker Diarization
In: Interspeech 2011 ; https://hal.archives-ouvertes.fr/hal-01690265 ; Interspeech 2011, Aug 2011, Florence, Italy (2011)
BASE
Show details
19
Time structure and detection of the multivoiced segments in mixed speech
In: International Congress of Phonetic Sciences ; https://hal.archives-ouvertes.fr/hal-01836479 ; International Congress of Phonetic Sciences, Jan 2011, Hong Kong, China (2011)
BASE
Show details
20
Comparison of speaker adaptation methods as feature extraction for SVM-based speaker recognition
In: Institute of Electrical and Electronics Engineers. IEEE transactions on audio, speech and language processing. - New York, NY : Inst. 18 (2010) 6, 1366-1378
BLLDB
Show details

Page: 1 2

Catalogues
1
0
2
0
0
0
0
Bibliographies
4
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
26
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern