41 |
Coreference and anaphoric annotations for spontaneous speech corpora in French.
|
|
|
|
In: 8th Discourse Anaphora and Anaphor Resolution Colloquium ; https://halshs.archives-ouvertes.fr/halshs-00764786 ; 8th Discourse Anaphora and Anaphor Resolution Colloquium, Oct 2011, Faro, Portugal. pp.182-190 (2011)
|
|
BASE
|
|
Show details
|
|
42 |
An Analysis of the Performances of the CasEN Named Entities Recognition System in the Ester2 Evaluation Campaign
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-00502370 ; 2010 (2010)
|
|
BASE
|
|
Show details
|
|
43 |
Dénomination et anaphore lexicale - Le réseau sémantique de Prolexbase
|
|
|
|
In: Construction d'identité et processus d'identification ; https://hal.archives-ouvertes.fr/hal-01067232 ; S.N. Osu, G. Col, N. Garric et F. Toupin. Construction d'identité et processus d'identification, Peter Lang Editions, pp.151-163, 2010 (2010)
|
|
BASE
|
|
Show details
|
|
44 |
Reconnaissance d'entités nommées : enrichissement d'un système à base de connaissances à partir de techniques de fouille de textes
|
|
|
|
In: Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-00568758 ; Traitement Automatique des Langues Naturelles, Jul 2010, Montréal, Canada (2010)
|
|
BASE
|
|
Show details
|
|
45 |
Who are you, you who speak? Transducer cascades for information retrieval
|
|
|
|
In: 4th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics ; https://hal.archives-ouvertes.fr/hal-01174643 ; 4th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, Nov 2009, Poznań, Poland (2009)
|
|
Abstract:
International audience ; This paper deals with a survey corpus. We present information retrieval about the speaker. We used finite state transducer cascades and we present here detailed results with an evaluation. This work is part of a French project to enhance the corpus ESLO (sociolinguistic survey taken in the city of Orléans). This survey has been realized in 1968 and the project is to save records in computer format, to transcribe them and to increase the transcription with annotations in XML format. This work was supported by a French ANR contract (ANR-06-CORP-023) and by European fund from Région Centre (FEDER). The corpus represent a collection of 200 interviews with the questions about the life in the city of Orléans: How long have you lived in Orléans for?, What led you to live in Orléans?, Do you like living in Orléans?, etc. and questions about the occupation or the family of the speaker, completed by recordings within a professional or private context. The recording situations are different: interviews, discussions between friends, recordings in microphone hidden, interviews with the political, academic and religious personalities, conversations between a social worker and parents in Psycho Medical Center of Orleans. In total, we have 300 hours of speech estimated to 4,500,000 words. More precisely, we worked on almost 120 transcribed hours representing 112 Transcriber XML files and 32 577 Kb. We worked on 105 files (31 004 Kb) and we evaluated the results on 7 files (1 573 Kb-5.1%). The transcription files have no punctuation marks, but the first letter of proper names is capitalized and acronyms are fully capitalized. We used the CasSys system (Friburger, Maurel, 2004) that computes texts with transducer cascades (Abney, 1996). The cascades we used are hand built: each transducer describes a local grammar for the recognition of some entities. Some times this recognition needs the succession of two or more transducers, in a specific order. More precisely, we used two cascades; the first one, for named entity recognition, was built some years ago for a newspaper corpus and we adapted it to oral corpus in the project; the second one aimed at discovering information about the speaker in three domains: origin (is he/she Orléans city native or where he/she comes from?), family (is he/she married, with children or not?) and occupation (what is his/her occupation? where does he/she work?). We called this information designating entities. This second cascade was specifically built for the project. CasSys computes transducers with Unitex software (Paumier, 2003) that needs to segment the text by preprocessing. For written text, this segmentation usually uses sentence boundary detection (Friburger and al., 2000). In our corpus there is no punctuation. So we have chosen to use XML Transcriber tags to do the segmentation and also to hide the inside of the tag for the named entity task, sometimes ambiguous with context entities (Dister, 2007).
|
|
Keyword:
[INFO]Computer Science [cs]; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; Information retrieval; Named entity task; Survey 1 Motivation; Transducer cascades
|
|
URL: https://hal.archives-ouvertes.fr/hal-01174643 https://hal.archives-ouvertes.fr/hal-01174643/file/ltc-2009-Maurel.pdf https://hal.archives-ouvertes.fr/hal-01174643/document
|
|
BASE
|
|
Hide details
|
|
46 |
Temporal Expressions: Comparisons in a Multilingual Corpus
|
|
|
|
In: Human Language Technologies as a Challenge for Computer Science and Linguistics ; 4th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics ; https://hal.archives-ouvertes.fr/hal-01024150 ; 4th Language & Technology Conference: Human Language Technologies as a Challenge for Computer Science and Linguistics, 2009, Poznan, Poland. pp.531-535 (2009)
|
|
BASE
|
|
Show details
|
|
47 |
Explorer des corpus à l'aide de CasSys. Application au Corpus d'Orléans
|
|
|
|
In: In G.Willems (ed.), Texte et corpus n°4, Actes des 6es Journées Internationales de Linguistique de Corpus (JLC). ; Journées de Linguistique de Corpus ; https://hal.archives-ouvertes.fr/hal-01174606 ; Journées de Linguistique de Corpus, Sep 2009, Lorient, France. pp.189-196 (2009)
|
|
BASE
|
|
Show details
|
|
49 |
Prolexbase. A multilingual relational lexical database of proper names
|
|
|
|
In: LREC ; Sixth language resources and evaluation conference ; https://hal.archives-ouvertes.fr/hal-01024056 ; Sixth language resources and evaluation conference, 2008, Marrakech, Morocco (2008)
|
|
BASE
|
|
Show details
|
|
50 |
Prolexbase : Une base de données lexicale de noms propres pour le Tal
|
|
|
|
In: Colloque Lexicographie et informatique : bilan et perspectives ; https://hal.archives-ouvertes.fr/hal-01030489 ; Colloque Lexicographie et informatique : bilan et perspectives, Jan 2008, Nancy, France. pp.137-144 (2008)
|
|
BASE
|
|
Show details
|
|
51 |
Prolexbase et LMF: vers un standard pour les ressources lexicales sur les noms propres
|
|
|
|
In: ISSN: 1248-9433 ; EISSN: 1965-0906 ; Revue TAL ; https://hal.archives-ouvertes.fr/hal-01021179 ; Revue TAL, ATALA (Association pour le Traitement Automatique des Langues), 2008, 49 (1), pp.61-88 ; https://www.atala.org/content/prolexbase-et-lmf-vers-un-standard-pour-les-ressources-lexicales-sur-les-noms-propres (2008)
|
|
BASE
|
|
Show details
|
|
52 |
Compression method for natural language automata
|
|
|
|
In: Finite-State Methods and Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-01024076 ; Finite-State Methods and Natural Language Processing, 2008, Ispra, Italy. pp.146-157 (2008)
|
|
BASE
|
|
Show details
|
|
53 |
Compression de dictionnaires électroniques
|
|
|
|
In: Neuvièmes journées internationales d'analyse statistique des données textuelles ; https://hal.archives-ouvertes.fr/hal-01030743 ; Neuvièmes journées internationales d'analyse statistique des données textuelles, 2008, Lyon, France. pp.1103-1114 (2008)
|
|
BASE
|
|
Show details
|
|
54 |
Automates et morphologie. Autour des noms propres, quelques réflexions sur la flexion en français
|
|
|
|
In: Linguistics, Computer Science and Language Processin ; https://hal.archives-ouvertes.fr/hal-01067215 ; Gaston Gross, Klaus U.Schulz. Linguistics, Computer Science and Language Processin, College Publications, pp.189-203, 2008 (2008)
|
|
BASE
|
|
Show details
|
|
55 |
Balisage XML des entités nommées et dénommantes du corpus Eslo
|
|
|
|
In: 1st Cataloguing and Encoding of Spoken Language Data (CatCod) ; https://hal.archives-ouvertes.fr/hal-01048597 ; 1st Cataloguing and Encoding of Spoken Language Data (CatCod), Dec 2008, Orléans, France (2008)
|
|
BASE
|
|
Show details
|
|
|
|