Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5 6 7...14

Hits 41 – 60 of 267

41	TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources ...
	Romary, Laurent; Tasovac, Toma. - : Zenodo, 2019
	BASE
	Show details

42	TEI and the Mixtepec-Mixtec corpus: data integration, annotation and normalization of heterogeneous data for an under-resourced language
	Bowers, Jack; Romary, Laurent. - 2019
	BASE
	Show details

43	TEI and the Mixtepec-Mixtec corpus: data integration, annotation and normalization of heterogeneous data for an under-resourced language
	Bowers, Jack; Romary, Laurent. - 2019
	BASE
	Show details

44	MKM – ein Metamodell für Korpusmetadaten
	Odebrecht, Carolin [Verfasser]; Lüdeling, Anke [Gutachter]; Romary, Laurent [Gutachter]. - Berlin : Humboldt-Universität zu Berlin, 2018
	DNB Subject Category Language
	Show details

45	Tutoring Systems and Computer-Assisted Language Learning (CALL)
	Mehler, Alexander [Herausgeber]; Lobin, Henning [Verfasser]; Rösler, Dietmar [Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2018
	DNB Subject Category Language
	Show details

46	[tiger2] As a standardized serialisation for ISO 24615 - SynAF
	Pareja-Lora, Antonio [Verfasser]; Zeldes, Amir [Verfasser]; Romary, Laurent [Verfasser]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2018
	DNB Subject Category Language
	Show details

47	Representing human and machine dictionaries in markup languages (SGML, XML)
	Witt, Andreas [Verfasser]; Romary, Laurent [Verfasser]; Schweickard, Wolfgang [Herausgeber]. - Mannheim : Institut für Deutsche Sprache, Bibliothek, 2018
	DNB Subject Category Language
	Show details

48	Bridging the Gaps between Digital Humanities, Lexicography, and Linguistics: A TEI Dictionary for the Documentation of Mixtepec-Mixtec
	Bowers, Jack; Romary, Laurent
	In: ISSN: 2160-5076 ; Dictionaries: Journal of the Dictionary Society of North America ; https://hal.inria.fr/hal-01968871 ; Dictionaries: Journal of the Dictionary Society of North America, Dictionary Society of North America, 2018, 39 (2), pp.79-106 (2018)
	BASE
	Show details

49	TEI Lex-0: A Target Format for TEI-Encoded Dictionaries and Lexical Resources
	Romary, Laurent; Tasovac, Toma
	In: TEI Conference and Members' Meeting ; https://hal.inria.fr/hal-02265312 ; TEI Conference and Members' Meeting, Sep 2018, Tokyo, Japan (2018)
	BASE
	Show details

50	Enhancing Usability for Automatically Structuring Digitised Dictionaries
	Khemakhem, Mohamed; Herold, Axel; Romary, Laurent
	In: GLOBALEX workshop at LREC 2018 ; https://hal.archives-ouvertes.fr/hal-01708137 ; GLOBALEX workshop at LREC 2018, May 2018, Miyazaki, Japan (2018)
	BASE
	Show details

51	Retro-digitizing and Automatically Structuring a Large Bibliography Collection
	Lindemann, David; Khemakhem, Mohamed; Romary, Laurent
	In: European Association for Digital Humanities (EADH) Conference ; https://hal.archives-ouvertes.fr/hal-01941534 ; European Association for Digital Humanities (EADH) Conference, EADH, Dec 2018, Galway, Ireland (2018)
	BASE
	Show details

52	A stand-off XML-TEI representation of reference annotation
	Adli, Aria; Engel, Eric; Romary, Laurent...
	In: DGfS 2018: 40. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft ; https://hal.inria.fr/hal-01876327 ; DGfS 2018: 40. Jahrestagung der Deutschen Gesellschaft für Sprachwissenschaft, Mar 2018, Stuttgart, Germany. 2017 (2018)
	BASE
	Show details

53	A Diachronic Digital Edition of the Petit Larousse illustré
	Bohbot, Herve; Faucher, Alexandre; Frontini, Francesca...
	In: Journée d'étude CORLI : Traitements et standardisation des corpus multimodaux et web 2.0. ; https://hal.archives-ouvertes.fr/hal-01873805 ; Journée d'étude CORLI : Traitements et standardisation des corpus multimodaux et web 2.0., May 2018, Paris, France (2018)
	BASE
	Show details

54	Automatically Encoding Encyclopedic-like Resources in TEI
	Khemakhem, Mohamed; Romary, Laurent; Gabay, Simon...
	In: The annual TEI Conference and Members Meeting ; https://hal.inria.fr/hal-01819505 ; The annual TEI Conference and Members Meeting, Sep 2018, Tokyo, Japan ; https://tei2018.dhii.asia/ (2018)
	BASE
	Show details

55	TEI-Lex0 Etym -towards terse(r) recommendations for the encoding of etymological information
	Bowers, Jack; Herold, Axel; Romary, Laurent
	In: TEI Conference and Members' Meeting ; https://hal.inria.fr/hal-02075506 ; TEI Conference and Members' Meeting, Sep 2018, Tokyo, Japan (2018)
	BASE
	Show details

56	Encoding Mixtepec-Mixtec Etymology in TEI
	Bowers, Jack; Romary, Laurent
	In: TEI Conference and Members' Meeting ; https://hal.inria.fr/hal-02003975 ; TEI Conference and Members' Meeting, Sep 2018, Tokyo, Japan (2018)
	BASE
	Show details

57	Presenting the Nénufar Project: a Diachronic Digital Edition of the Petit Larousse Illustré
	Bohbot, Hervé; Frontini, Francesca; Luxardo, Giancarlo...
	In: GLOBALEX 2018 - Globalex workshop at LREC2018 ; https://hal.archives-ouvertes.fr/hal-01728328 ; GLOBALEX 2018 - Globalex workshop at LREC2018, May 2018, Miyazaki, Japan. pp.1-6 ; https://globalex.link/globalex2018/ (2018)
	BASE
	Show details

58	MKM – ein Metamodell für Korpusmetadaten
	Odebrecht, Carolin. - : Humboldt-Universität zu Berlin, 2018
	BASE
	Show details

59	TBX in ODD: Schema-agnostic specification and documentation for TermBase eXchange
	Pernes, Stefan; Romary, Laurent; Warburton, Kara
	In: LOTKS 2017- Workshop on Language, Ontology, Terminology and Knowledge Structures ; https://hal.inria.fr/hal-01581440 ; LOTKS 2017- Workshop on Language, Ontology, Terminology and Knowledge Structures, Sep 2017, Montpellier, France ; https://langandonto.github.io/LangOnto-TermiKS-2017/ (2017)
	BASE
	Show details

60	Automatic Extraction of TEI Structures in Digitized Lexical Resources using Conditional Random Fields
	Khemakhem, Mohamed; Foppiano, Luca; Romary, Laurent
	In: electronic lexicography, eLex 2017 ; https://hal.archives-ouvertes.fr/hal-01508868 ; electronic lexicography, eLex 2017, Sep 2017, Leiden, Netherlands (2017)
	Abstract: International audience ; An important number of digitized lexical resources remain unexploited due to their unstructured content. Manually structuring such resources is a costly task given their multifold complexity. Our goal is to find an approach to automatically structure digitized dictionaries, independently from the language or the lexicographic school or style. In this paper we present a first version of GROBID-Dictionaries1, an open source machine learning system for lexical information extraction.Our approach is twofold: we perform a cascading structure extraction, while we select at each level specific features for training.We followed a ”divide to conquer” strategy to dismantle text constructs in a digitized dictionary, based on the observation of their layout. Main pages (see Figure 1) in almost any dictionary share three blocks: a header (green), a footer (blue) and a body (orange). The body is, in its turn, constituted by several entries (red). Each lexical entry can be further decomposed (see Figure 2) as: form (green), etymology (blue), sense (red) or/and related entry. The same logic could be applied further for each extracted block but in the scope of this paper we focus just on the first three levels.The cascading approach ensures a better understanding of the learning process’s output and consequently simplifies the feature selection process. Limited exclusive text blocks per level helps significantly in diagnosing the cause of prediction errors. It allows an early detection and replacement of irrelevant selected features that can bias a trained model. In such a segmentation, it becomes more straightforward to notice that, for instance, the token position in the page is very relevant to detect headers and footers and has almost no pertinence for capturing a sense in a lexical entry which is very often split on two pages.To implement our approach, we took up the available infrastructure from GROBID [7], a machine learning system for the extraction of bibliographic metadata. GROBID adopts the same cascading approach and uses Conditional Random Fields (CRF) [6] to label text sequences. The output of Grobid dictionary is planned to generate a TEI compliant encoding [2, 9] where the various segmentation levels are associated with an appropriate XML tessellation. Collaboration with COST ENeL are ongoing to ensure maximal compatibility with existing dictionary projects.Our experiments justify so far our choices, where models for the first two levels trained on two different dictionary samples have given a high precision and recall with a small amount of annotated data. Relying mainly on the text layout, we tried to diversify the selected features for each model, on the token and line levels. We are working on tuning features and annotating more data to maintain the good results with new samples and to improve the third segmentation level.While just few task specific attempts [1] have been using machine learning in this research direction, the landscape remains dominated by rule based techniquess [4, 3, 8] which are ad-hoc and costly, even impossible, to adapt for new lexical resources.
	Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; [STAT.ML]Statistics [stat]/Machine Learning [stat.ML]; automatic structuring; CRF; digitized dictionaries; machine learning; TEI
	URL: https://hal.archives-ouvertes.fr/hal-01508868v2/document https://hal.archives-ouvertes.fr/hal-01508868 https://hal.archives-ouvertes.fr/hal-01508868v2/file/eLex-2017-Template.pdf
	BASE
	Hide details

Page: 1 2 3 4 5 6 7...14

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern