Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2

Hits 1 – 20 of 22

1	UDLex: Towards Cross-language Subcategorization Lexicons
	Rambelli, Giulia; Lenci, Alessandro; Poibeau, Thierry
	In: Proceedings of the Fourth International Conference on Dependency Linguistics ; Fourth International Conference on Dependency Linguistics (Depling 2017) ; https://hal.archives-ouvertes.fr/hal-01856180 ; Fourth International Conference on Dependency Linguistics (Depling 2017), University of Pisa, Sep 2017, Pise, Italy (2017)
	BASE
	Show details

2	Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields
	Khemakhem, Mohamed; Foppiano, Luca; Romary, Laurent
	In: electronic lexicography, eLex 2017 ; https://hal.archives-ouvertes.fr/hal-01508868 ; electronic lexicography, eLex 2017, Sep 2017, Leiden, Netherlands (2017)
	Abstract: International audience ; An important number of digitized lexical resources remain unexploited due to their unstructured content. Manually structuring such resources is a costly task given their multifold complexity. Our goal is to find an approach to automatically structure digitized dictionaries, independently from the language or the lexicographic school or style. In this paper we present a first version of GROBID-Dictionaries1, an open source machine learning system for lexical information extraction.Our approach is twofold: we perform a cascading structure extraction, while we select at each level specific features for training.We followed a ”divide to conquer” strategy to dismantle text constructs in a digitized dictionary, based on the observation of their layout. Main pages (see Figure 1) in almost any dictionary share three blocks: a header (green), a footer (blue) and a body (orange). The body is, in its turn, constituted by several entries (red). Each lexical entry can be further decomposed (see Figure 2) as: form (green), etymology (blue), sense (red) or/and related entry. The same logic could be applied further for each extracted block but in the scope of this paper we focus just on the first three levels.The cascading approach ensures a better understanding of the learning process’s output and consequently simplifies the feature selection process. Limited exclusive text blocks per level helps significantly in diagnosing the cause of prediction errors. It allows an early detection and replacement of irrelevant selected features that can bias a trained model. In such a segmentation, it becomes more straightforward to notice that, for instance, the token position in the page is very relevant to detect headers and footers and has almost no pertinence for capturing a sense in a lexical entry which is very often split on two pages.To implement our approach, we took up the available infrastructure from GROBID [7], a machine learning system for the extraction of bibliographic metadata. GROBID adopts the same cascading approach and uses Conditional Random Fields (CRF) [6] to label text sequences. The output of Grobid dictionary is planned to generate a TEI compliant encoding [2, 9] where the various segmentation levels are associated with an appropriate XML tessellation. Collaboration with COST ENeL are ongoing to ensure maximal compatibility with existing dictionary projects.Our experiments justify so far our choices, where models for the first two levels trained on two different dictionary samples have given a high precision and recall with a small amount of annotated data. Relying mainly on the text layout, we tried to diversify the selected features for each model, on the token and line levels. We are working on tuning features and annotating more data to maintain the good results with new samples and to improve the third segmentation level.While just few task specific attempts [1] have been using machine learning in this research direction, the landscape remains dominated by rule based techniquess [4, 3, 8] which are ad-hoc and costly, even impossible, to adapt for new lexical resources.
	Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; [STAT.ML]Statistics [stat]/Machine Learning [stat.ML]; automatic structuring; CRF; digitized dictionaries; machine learning; TEI
	URL: https://hal.archives-ouvertes.fr/hal-01508868v2/document https://hal.archives-ouvertes.fr/hal-01508868 https://hal.archives-ouvertes.fr/hal-01508868v2/file/eLex-2017-Template.pdf
	BASE
	Hide details

3	Relations extraction to populate a knowledge base from Tweets ; Extraction de relations pour le peuplement d'une base de connaissance à partir de tweets
	Lopez, Cédric; Cabrio, Elena; Segond, Frédérique
	In: EGC2017 - Conférence Extraction et Gestion des Connaissances ; https://hal.archives-ouvertes.fr/hal-01473718 ; EGC2017 - Conférence Extraction et Gestion des Connaissances , Jan 2017, Grenoble, France ; http://egc2017.imag.fr/ (2017)
	BASE
	Show details

4	Ontolex JeuxDeMots and Its Alignment to the Linguistic Linked Open Data Cloud
	Tchechmedjiev, Andon; Mandon, Théophile; Lafourcade, Mathieu...
	In: 16th International Semantic Web Conference ; ISWC: International Semantic Web Conference ; https://hal-lirmm.ccsd.cnrs.fr/lirmm-01615473 ; ISWC: International Semantic Web Conference, Oct 2017, Vienne, Austria. pp.678-693, ⟨10.1007/978-3-319-68288-4_40⟩ ; https://iswc2017.semanticweb.org (2017)
	BASE
	Show details

5	FrenchSentiClass : an Automated System for French Sentiment Classification ; FrenchSentiClass : un Système Automatisé pour la Classification de Sentiments en Français
	Tapi Nzali, Mike Donald; Abdaoui, Amine; Azé, Jérôme...
	In: Actes de l’atelier DEFT de la conférence TALN 2017 ; DEFT: Défi Fouille de Texte ; https://hal-lirmm.ccsd.cnrs.fr/lirmm-01563411 ; DEFT: Défi Fouille de Texte, Jun 2017, Orléans, France (2017)
	BASE
	Show details

6	Automatic Measures to Characterise Verbal Alignment in Human-Agent Interaction
	Dubuisson Duplessis, Guillaume; Clavel, Chloé; Landragin, Frédéric
	In: 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) ; https://hal.archives-ouvertes.fr/hal-01577813 ; 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), Aug 2017, Saarbrücken, Germany. pp.71-81 ; http://www.sigdial.org/workshops/conference18 (2017)
	BASE
	Show details

7	Neural Networks for Multi-Word Expression Detection
	Klyueva, Natalia; Doucet, Antoine; Straka, Milan
	In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) ; https://hal.archives-ouvertes.fr/hal-03025446 ; Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Apr 2017, Valencia, Spain. pp.60-65, ⟨10.18653/v1/W17-1707⟩ (2017)
	BASE
	Show details

8	Pronunciation and disfluency modeling for expressive speech synthesis ; Modélisation de la prononciation et des disfluences pour la synthèse de la parole expressive
	Qader, Raheel. - : HAL CCSD, 2017
	In: https://hal.inria.fr/tel-01668014 ; Artificial Intelligence [cs.AI]. Université Rennes 1, 2017. English. ⟨NNT : 2017REN1S076⟩ (2017)
	BASE
	Show details

9	Named entity recognition within Arabic text and their semantic relations ; Extraction d'information à partir d'un texte arabe : extraction des entités nommées et leurs relations sémantiques
	Doumi, Noureddine. - : HAL CCSD, 2017
	In: https://hal.archives-ouvertes.fr/tel-01716911 ; Intelligence artificielle [cs.AI]. Université Djillali Liabes de Sidi Bel Abbès, 2017. Français (2017)
	BASE
	Show details

10	Information Extraction for the Seed Development Regulatory Networks of Arabidopsis thaliana ; Extraction d’Information pour les réseaux de régulation de la graine chez Arabidopsis thaliana.
	Valsamou, Dialekti. - : HAL CCSD, 2017
	In: https://hal.inrae.fr/tel-02786135 ; Artificial Intelligence [cs.AI]. Université Paris Saclay (COMUE), 2017. English (2017)
	BASE
	Show details

11	Use of deep learning in the context of Poorly endowed languages
	Fadili, Hammou
	In: 24e Conférence sur le Traitement Automatique de la Langue Naturelle (TALN) ; https://hal-cnam.archives-ouvertes.fr/hal-02555530 ; 24e Conférence sur le Traitement Automatique de la Langue Naturelle (TALN), Jun 2017, Orléans, France (2017)
	BASE
	Show details

12	Delayed interpretation, shallow processing and constructions: the basis of the "interpret whenever possible" principle
	Blache, Philippe
	In: Cognitive Approach to Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-01907628 ; Cognitive Approach to Natural Language Processing, 2017 (2017)
	BASE
	Show details

13	Online Learning of Task-specific Word Representations with a Joint Biconvex Passive-Aggressive Algorithm
	Denis, Pascal; Ralaivola, Liva
	In: European Chapter of the Association for Computational Linguistics ; https://hal.inria.fr/hal-01590594 ; European Chapter of the Association for Computational Linguistics, Apr 2017, Valencia, Spain. pp.775 - 784, ⟨10.18653/v1/E17-1073⟩ (2017)
	BASE
	Show details

14	Authenticity in a Digital Era: Still a Document Process ; Authenticity in a Digital Era: Still a Document Process: The Case of Laboratory Notebooks
	Tosi, Lorraine; Bénel, Aurélien
	In: ACM Symposium on Document Engineering ; https://hal-utt.archives-ouvertes.fr/hal-02363351 ; ACM Symposium on Document Engineering, Sep 2017, Valletta, Malta. pp.109-112, ⟨10.1145/3103010.3121034⟩ (2017)
	BASE
	Show details

15	Archives numériques et construction du sens ou « Comment échapper au Web sémantique ? »
	Bénel, Aurélien
	In: ISSN: 0016-5522 ; La Gazette des Archives ; https://hal-utt.archives-ouvertes.fr/hal-02372470 ; La Gazette des Archives , Association des archivistes français, 2017, Meta/morphoses. Les archives, bouillons de culture numérique, 245 (1), pp.163-177 ; https://www.archivistes.org/Meta-morphoses-Les-archives-bouillons-de-culture-numerique (2017)
	BASE
	Show details

16	Automatic enjambment detection as a new source of evidence in Spanish versification
	Martínez Cantón, Clara ,; Ruiz, Pablo; Gonzalez-Blanco, Elena...
	In: https://hal.archives-ouvertes.fr/hal-01722359 ; 2017 (2017)
	BASE
	Show details

17	Dialogue management in task-oriented dialogue systems
	Blache, Philippe
	In: 1st ACM SIGCHI International Workshop on In-vestigating Social Interactions with Artificial Agents (ISIAA'17) ; https://hal.archives-ouvertes.fr/hal-01708376 ; 1st ACM SIGCHI International Workshop on In-vestigating Social Interactions with Artificial Agents (ISIAA'17), Nov 2017, Glasgow, United Kingdom. ⟨10.1145/3139491.3139507⟩ (2017)
	BASE
	Show details

18	« Nous nous arrachâmes promptement avec ma caisse » : quels descripteurs linguistiques caractérisent les registres de langue ?
	Mekki, Jade; Battistelli, Delphine; Béchet, Nicolas...
	In: https://hal.inria.fr/hal-01649948 ; [Rapport Technique] IRISA, équipe EXPRESSION; MoDyCo. 2017 (2017)
	BASE
	Show details

19	Machine Translation
	Poibeau, Thierry. - : HAL CCSD, 2017. : MIT Press, 2017
	In: https://hal.archives-ouvertes.fr/hal-01674140 ; MIT Press, 2017, 9780262534215 ; https://mitpress.mit.edu/books/machine-translation-0 (2017)
	BASE
	Show details

20	Querying biomedical Linked Data with natural language questions
	Hamon, Thierry; Grabar, Natalia; Mougin, Fleur
	In: ISSN: 1570-0844 ; EISSN: 2210-4968 ; Semantic Web – Interoperability, Usability, Applicability ; https://hal.archives-ouvertes.fr/hal-01426686 ; Semantic Web – Interoperability, Usability, Applicability, IOS Press, 2017, 8, pp.581-599. ⟨10.3233/SW-160244⟩ (2017)
	BASE
	Show details

Page: 1 2

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern