DE eng

Search in the Catalogues and Directories

Hits 1 – 2 of 2

1
Similarité par recouvrement de séquences pour la fouille de données séquentielles et textuelles
In: Extraction et Gestion des Connaissances (EGC 2019) ; https://hal.archives-ouvertes.fr/hal-01999965 ; Extraction et Gestion des Connaissances (EGC 2019), Jan 2019, Metz, France. pp.105-116 ; https://editions-rnti.fr/?procid=100176 (2019)
BASE
Show details
2
Two Multilingual Corpora Extracted from the Tenders Electronic Daily for Machine Learning and Machine Translation Applications
In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-01865091 ; Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), May 2018, Myazaki, Japan (2018)
Abstract: International audience ; The European "Tenders Electronic Daily" (TED) is a large source of semi-structured and multilingual data that is very valuable to the Natural Language Processing community. This data sets can effectively be used to address complex machine translation, multilingual terminology extraction, text-mining, or to benchmark information retrieval systems. Despite of the services offered by the user-friendliness of the web site that is made available to the public to access the publishing of the EU call for tenders, collecting and managing such kind of data is a great burden and consumes a lot of time and computing resources. This could explain why such a resource is not very (if any) exploited today by computer scientists or engineers in NLP. The aim of this paper is to describe two documented and easy-to-use multilingual corpora (one of them is a parallel corpus), extracted from the TED web source that we will release for the benefit of the NLP community.
Keyword: [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Call for Tender; European Languages; Multilingual corpora; Natural Language Resource; Parallel Corpus
URL: https://hal.archives-ouvertes.fr/hal-01865091/file/LREC-832.pdf
https://hal.archives-ouvertes.fr/hal-01865091/document
https://hal.archives-ouvertes.fr/hal-01865091
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
2
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern