81 |
Une nouvelle approche de modélisation du langage par des réseaux Bayésiens dynamiques
|
|
|
|
In: XXVes Journées d'Etudes sur la Parole - JEP-TALN-RECITAL 2004 ; https://hal.inria.fr/inria-00107785 ; XXVes Journées d'Etudes sur la Parole - JEP-TALN-RECITAL 2004, 2004, Fès, Maroc (2004)
|
|
BASE
|
|
Show details
|
|
82 |
Language modeling using dynamic Bayesian networks
|
|
|
|
In: 4th International Conference on Language Resources and Evaluation - LREC 2004 ; https://hal.inria.fr/inria-00107786 ; 4th International Conference on Language Resources and Evaluation - LREC 2004, 2004, Lisbonne, Portugal (2004)
|
|
BASE
|
|
Show details
|
|
83 |
Statistical Feature Language Model
|
|
|
|
In: 8th International Conference on Spoken Language Processing - ICSLP' 2004 ; https://hal.inria.fr/inria-00100021 ; 8th International Conference on Spoken Language Processing - ICSLP' 2004, 2004, Jeju, South Korea. 4 p (2004)
|
|
BASE
|
|
Show details
|
|
87 |
Understanding speech based on a Bayesian concept extraction method
|
|
|
|
In: Sixth International Conference on Text Speech and Dialogue - TSD'03 ; https://hal.inria.fr/inria-00099696 ; Sixth International Conference on Text Speech and Dialogue - TSD'03, Sep 2003, Ceské-Budejovic, République Tchèque, France. 8 p (2003)
|
|
BASE
|
|
Show details
|
|
88 |
Understanding process for speech recognition
|
|
|
|
In: Eighth European Conference on Speech Communication and Technology - EuroSpeech'03 ; https://hal.inria.fr/inria-00099695 ; Eighth European Conference on Speech Communication and Technology - EuroSpeech'03, Sep 2003, Genève, Suisse, France. 4 p (2003)
|
|
BASE
|
|
Show details
|
|
89 |
Événements impossibles en modélisation stochastique du langage
|
|
|
|
In: ISSN: 1248-9433 ; EISSN: 1965-0906 ; Revue TAL ; https://hal.inria.fr/inria-00099594 ; Revue TAL, ATALA (Association pour le Traitement Automatique des Langues), 2003, 44 (1), pp.33-61 (2003)
|
|
BASE
|
|
Show details
|
|
90 |
Statistical Language Modeling Based on Variable-Length Sequences
|
|
|
|
In: ISSN: 0885-2308 ; EISSN: 1095-8363 ; Computer Speech and Language ; https://hal.inria.fr/inria-00099785 ; Computer Speech and Language, Elsevier, 2003, 17 (1), pp.27-41 (2003)
|
|
Abstract:
Article dans revue scientifique avec comité de lecture. ; In natural language and especially in spontaneous speech, people often group words in order to constitute phrases which become usual expressions. This is due to phonological (to make the pronunciation easier), or to semantic reasons (to remember more easily a phrase by assigning a meaning to a block of words). Classical language models do not adequately take into account such phrases. A better approach consists in modeling some word sequences as if they were individual dictionary elements. Sequences are considered as additional entries of the vocabulary, on which language models are computed. In this paper, we present a method for automatically retrieving the most relevant phrases from a corpus of written sentences. The originality of our approach resides in the fact that the extracted phrases are obtained from a linguistically tagged corpus. Therefore, the obtained phrases are linguistically viable. To measure the contribution of classes in retrieving phrases, we have implemented the same algorithm without using classes. The class-based method outperformed by 11% the other method. Our approach uses information theoretic criteria which insure a high statistical consistency and make the decision of selecting a potential sequence optimal in accordance with the language perplexity. We propose several variants of language model with and without word sequences. Among them, we present a model in which the trigger pairs are linguistically more significant. We show that the use of sequences decrease the word error rate and improve the normalized perplexity. For instance, the best sequence model improves the perplexity by 16%, and the accuracy of our dictation system (MAUD) by approximately 14%. Experiments, in terms of perplexity and recognition rate, have been carried out on a vocabulary of 20000 words extracted from a corpus of 43 million words made up of two years of the French newspaper Le Monde. The acoustic model (HMM) is trained with the Bref80 corpus.
|
|
Keyword:
[INFO.INFO-OH]Computer Science [cs]/Other [cs.OH]; cache; language modeling; modèle de langage; normalized perplexity; perplexité normalisée; phrases; séquences; triggres
|
|
URL: https://hal.inria.fr/inria-00099785
|
|
BASE
|
|
Hide details
|
|
91 |
Dynamic Topic Identification : Introduction of Trigger pairs in the Cache Model
|
|
|
|
In: International Workshop Speech and Computer 2002 - SPECOM'2002 ; https://hal.inria.fr/inria-00100828 ; International Workshop Speech and Computer 2002 - SPECOM'2002, 2002, St-Petersburg, Russia, 4 p (2002)
|
|
BASE
|
|
Show details
|
|
92 |
Projet RAIVES (Recherche Automatique d'Informations Verbales Et Sonores) vers l'extraction et la structuration de données radiophoniques sur Internet
|
|
|
|
In: https://hal.inria.fr/inria-00107633 ; [Contrat] A02-R-553 || parlangeau-valles02a, IRIT - Institut de recherche en informatique de Toulouse; LORIA (Université de Lorraine, CNRS, INRIA). 2002 (2002)
|
|
BASE
|
|
Show details
|
|
93 |
Détection de séquences par sélection de l'historique : application à la reconnaissance automatique de la parole
|
|
|
|
In: XXIVe Journées d'Etudes sur la Parole - JEP'2002 ; https://hal.inria.fr/inria-00107575 ; XXIVe Journées d'Etudes sur la Parole - JEP'2002, Jun 2002, Nancy, France, pp.301 (2002)
|
|
BASE
|
|
Show details
|
|
94 |
Retrieving phrases by selecting the history: application to Automatic Speech Recognition
|
|
|
|
In: 7th International Conference on Spoken Language Processing - ICSLP'2002 ; https://hal.inria.fr/inria-00100805 ; 7th International Conference on Spoken Language Processing - ICSLP'2002, Sep 2002, Denver, USA, pp.721 (2002)
|
|
BASE
|
|
Show details
|
|
97 |
Statistical Language Model based on a Hierarchical Approach : MCnv
|
|
|
|
In: 7th european conference on speech communication and technology - EUROSPEECH 2001 ; https://hal.inria.fr/inria-00100677 ; 7th european conference on speech communication and technology - EUROSPEECH 2001, 2001, Aalborg, Denmark, pp.29 (2001)
|
|
BASE
|
|
Show details
|
|
98 |
A comparative study of Topic Identification on Newspaper and E-mail
|
|
|
|
In: Proceedings of the 8th International Symposium on String Processing and Information Retrieval - SPIRE'01 ; https://hal.inria.fr/inria-00107535 ; Proceedings of the 8th International Symposium on String Processing and Information Retrieval - SPIRE'01, 2001, Laguna de San Rafael, Chili, pp.238-241 (2001)
|
|
BASE
|
|
Show details
|
|
99 |
Improving Statistical Language Models by Removing Impossible Events
|
|
|
|
In: Proceedings of the International Workshop "Speech and Computer" - SPECOM 2001 ; https://hal.inria.fr/inria-00100651 ; Proceedings of the International Workshop "Speech and Computer" - SPECOM 2001, 2001, Moscow, Russia, 4 p (2001)
|
|
BASE
|
|
Show details
|
|
100 |
Experiment Analysis in Newspaper Topic Detection
|
|
|
|
In: SPIRE 2000 - String Processing & Information Retrieval ; https://hal.inria.fr/inria-00099394 ; SPIRE 2000 - String Processing & Information Retrieval, 2000, A Coruna, Spain. pp.55 - 64 (2000)
|
|
BASE
|
|
Show details
|
|
|
|