DE eng

Search in the Catalogues and Directories

Page: 1 2
Hits 1 – 20 of 22

1
UDLex: Towards Cross-language Subcategorization Lexicons
In: Proceedings of the Fourth International Conference on Dependency Linguistics ; Fourth International Conference on Dependency Linguistics (Depling 2017) ; https://hal.archives-ouvertes.fr/hal-01856180 ; Fourth International Conference on Dependency Linguistics (Depling 2017), University of Pisa, Sep 2017, Pise, Italy (2017)
BASE
Show details
2
Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields
In: electronic lexicography, eLex 2017 ; https://hal.archives-ouvertes.fr/hal-01508868 ; electronic lexicography, eLex 2017, Sep 2017, Leiden, Netherlands (2017)
Abstract: International audience ; An important number of digitized lexical resources remain unexploited due to their unstructured content. Manually structuring such resources is a costly task given their multifold complexity. Our goal is to find an approach to automatically structure digitized dictionaries, independently from the language or the lexicographic school or style. In this paper we present a first version of GROBID-Dictionaries1, an open source machine learning system for lexical information extraction.Our approach is twofold: we perform a cascading structure extraction, while we select at each level specific features for training.We followed a ”divide to conquer” strategy to dismantle text constructs in a digitized dictionary, based on the observation of their layout. Main pages (see Figure 1) in almost any dictionary share three blocks: a header (green), a footer (blue) and a body (orange). The body is, in its turn, constituted by several entries (red). Each lexical entry can be further decomposed (see Figure 2) as: form (green), etymology (blue), sense (red) or/and related entry. The same logic could be applied further for each extracted block but in the scope of this paper we focus just on the first three levels.The cascading approach ensures a better understanding of the learning process’s output and consequently simplifies the feature selection process. Limited exclusive text blocks per level helps significantly in diagnosing the cause of prediction errors. It allows an early detection and replacement of irrelevant selected features that can bias a trained model. In such a segmentation, it becomes more straightforward to notice that, for instance, the token position in the page is very relevant to detect headers and footers and has almost no pertinence for capturing a sense in a lexical entry which is very often split on two pages.To implement our approach, we took up the available infrastructure from GROBID [7], a machine learning system for the extraction of bibliographic metadata. GROBID adopts the same cascading approach and uses Conditional Random Fields (CRF) [6] to label text sequences. The output of Grobid dictionary is planned to generate a TEI compliant encoding [2, 9] where the various segmentation levels are associated with an appropriate XML tessellation. Collaboration with COST ENeL are ongoing to ensure maximal compatibility with existing dictionary projects.Our experiments justify so far our choices, where models for the first two levels trained on two different dictionary samples have given a high precision and recall with a small amount of annotated data. Relying mainly on the text layout, we tried to diversify the selected features for each model, on the token and line levels. We are working on tuning features and annotating more data to maintain the good results with new samples and to improve the third segmentation level.While just few task specific attempts [1] have been using machine learning in this research direction, the landscape remains dominated by rule based techniquess [4, 3, 8] which are ad-hoc and costly, even impossible, to adapt for new lexical resources.
Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; [STAT.ML]Statistics [stat]/Machine Learning [stat.ML]; automatic structuring; CRF; digitized dictionaries; machine learning; TEI
URL: https://hal.archives-ouvertes.fr/hal-01508868v2/document
https://hal.archives-ouvertes.fr/hal-01508868
https://hal.archives-ouvertes.fr/hal-01508868v2/file/eLex-2017-Template.pdf
BASE
Hide details
3
Relations extraction to populate a knowledge base from Tweets ; Extraction de relations pour le peuplement d'une base de connaissance à partir de tweets
In: EGC2017 - Conférence Extraction et Gestion des Connaissances ; https://hal.archives-ouvertes.fr/hal-01473718 ; EGC2017 - Conférence Extraction et Gestion des Connaissances , Jan 2017, Grenoble, France ; http://egc2017.imag.fr/ (2017)
BASE
Show details
4
Ontolex JeuxDeMots and Its Alignment to the Linguistic Linked Open Data Cloud
In: 16th International Semantic Web Conference ; ISWC: International Semantic Web Conference ; https://hal-lirmm.ccsd.cnrs.fr/lirmm-01615473 ; ISWC: International Semantic Web Conference, Oct 2017, Vienne, Austria. pp.678-693, ⟨10.1007/978-3-319-68288-4_40⟩ ; https://iswc2017.semanticweb.org (2017)
BASE
Show details
5
FrenchSentiClass : an Automated System for French Sentiment Classification ; FrenchSentiClass : un Système Automatisé pour la Classification de Sentiments en Français
In: Actes de l’atelier DEFT de la conférence TALN 2017 ; DEFT: Défi Fouille de Texte ; https://hal-lirmm.ccsd.cnrs.fr/lirmm-01563411 ; DEFT: Défi Fouille de Texte, Jun 2017, Orléans, France (2017)
BASE
Show details
6
Automatic Measures to Characterise Verbal Alignment in Human-Agent Interaction
In: 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL) ; https://hal.archives-ouvertes.fr/hal-01577813 ; 18th Annual Meeting of the Special Interest Group on Discourse and Dialogue (SIGDIAL), Aug 2017, Saarbrücken, Germany. pp.71-81 ; http://www.sigdial.org/workshops/conference18 (2017)
BASE
Show details
7
Neural Networks for Multi-Word Expression Detection
In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) ; https://hal.archives-ouvertes.fr/hal-03025446 ; Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Apr 2017, Valencia, Spain. pp.60-65, ⟨10.18653/v1/W17-1707⟩ (2017)
BASE
Show details
8
Pronunciation and disfluency modeling for expressive speech synthesis ; Modélisation de la prononciation et des disfluences pour la synthèse de la parole expressive
Qader, Raheel. - : HAL CCSD, 2017
In: https://hal.inria.fr/tel-01668014 ; Artificial Intelligence [cs.AI]. Université Rennes 1, 2017. English. ⟨NNT : 2017REN1S076⟩ (2017)
BASE
Show details
9
Named entity recognition within Arabic text and their semantic relations ; Extraction d'information à partir d'un texte arabe : extraction des entités nommées et leurs relations sémantiques
Doumi, Noureddine. - : HAL CCSD, 2017
In: https://hal.archives-ouvertes.fr/tel-01716911 ; Intelligence artificielle [cs.AI]. Université Djillali Liabes de Sidi Bel Abbès, 2017. Français (2017)
BASE
Show details
10
Information Extraction for the Seed Development Regulatory Networks of Arabidopsis thaliana ; Extraction d’Information pour les réseaux de régulation de la graine chez Arabidopsis thaliana.
Valsamou, Dialekti. - : HAL CCSD, 2017
In: https://hal.inrae.fr/tel-02786135 ; Artificial Intelligence [cs.AI]. Université Paris Saclay (COMUE), 2017. English (2017)
BASE
Show details
11
Use of deep learning in the context of Poorly endowed languages
In: 24e Conférence sur le Traitement Automatique de la Langue Naturelle (TALN) ; https://hal-cnam.archives-ouvertes.fr/hal-02555530 ; 24e Conférence sur le Traitement Automatique de la Langue Naturelle (TALN), Jun 2017, Orléans, France (2017)
BASE
Show details
12
Delayed interpretation, shallow processing and constructions: the basis of the "interpret whenever possible" principle
In: Cognitive Approach to Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-01907628 ; Cognitive Approach to Natural Language Processing, 2017 (2017)
BASE
Show details
13
Online Learning of Task-specific Word Representations with a Joint Biconvex Passive-Aggressive Algorithm
In: European Chapter of the Association for Computational Linguistics ; https://hal.inria.fr/hal-01590594 ; European Chapter of the Association for Computational Linguistics, Apr 2017, Valencia, Spain. pp.775 - 784, ⟨10.18653/v1/E17-1073⟩ (2017)
BASE
Show details
14
Authenticity in a Digital Era: Still a Document Process ; Authenticity in a Digital Era: Still a Document Process: The Case of Laboratory Notebooks
In: ACM Symposium on Document Engineering ; https://hal-utt.archives-ouvertes.fr/hal-02363351 ; ACM Symposium on Document Engineering, Sep 2017, Valletta, Malta. pp.109-112, ⟨10.1145/3103010.3121034⟩ (2017)
BASE
Show details
15
Archives numériques et construction du sens ou « Comment échapper au Web sémantique ? »
In: ISSN: 0016-5522 ; La Gazette des Archives ; https://hal-utt.archives-ouvertes.fr/hal-02372470 ; La Gazette des Archives , Association des archivistes français, 2017, Meta/morphoses. Les archives, bouillons de culture numérique, 245 (1), pp.163-177 ; https://www.archivistes.org/Meta-morphoses-Les-archives-bouillons-de-culture-numerique (2017)
BASE
Show details
16
Automatic enjambment detection as a new source of evidence in Spanish versification
In: https://hal.archives-ouvertes.fr/hal-01722359 ; 2017 (2017)
BASE
Show details
17
Dialogue management in task-oriented dialogue systems
In: 1st ACM SIGCHI International Workshop on In-vestigating Social Interactions with Artificial Agents (ISIAA'17) ; https://hal.archives-ouvertes.fr/hal-01708376 ; 1st ACM SIGCHI International Workshop on In-vestigating Social Interactions with Artificial Agents (ISIAA'17), Nov 2017, Glasgow, United Kingdom. ⟨10.1145/3139491.3139507⟩ (2017)
BASE
Show details
18
« Nous nous arrachâmes promptement avec ma caisse » : quels descripteurs linguistiques caractérisent les registres de langue ?
In: https://hal.inria.fr/hal-01649948 ; [Rapport Technique] IRISA, équipe EXPRESSION; MoDyCo. 2017 (2017)
BASE
Show details
19
Machine Translation
Poibeau, Thierry. - : HAL CCSD, 2017. : MIT Press, 2017
In: https://hal.archives-ouvertes.fr/hal-01674140 ; MIT Press, 2017, 9780262534215 ; https://mitpress.mit.edu/books/machine-translation-0 (2017)
BASE
Show details
20
Querying biomedical Linked Data with natural language questions
In: ISSN: 1570-0844 ; EISSN: 2210-4968 ; Semantic Web – Interoperability, Usability, Applicability ; https://hal.archives-ouvertes.fr/hal-01426686 ; Semantic Web – Interoperability, Usability, Applicability, IOS Press, 2017, 8, pp.581-599. ⟨10.3233/SW-160244⟩ (2017)
BASE
Show details

Page: 1 2

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
22
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern