DE eng

Search in the Catalogues and Directories

Hits 1 – 9 of 9

1
Mining and exploiting domain-specific corpora in the PANACEA platform
Bel Rafecas, Núria; Prokopidis, Prokopis; Toral, Antonio; Arranz, Victoria; Papavassiliou, Vassilis. - : ELRA (European Language Resources Association)
Abstract: The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.
Keyword: Boilerplate removal; Corpus acquisition; IPR for language resources; Web crawling
URL: http://hdl.handle.net/10230/20416
BASE
Hide details
2
Criteria for evaluation of resources, technology and integration
BASE
Show details
3
Third evaluation report. Evaluation of PANACEA v3 and produced resources
BASE
Show details
4
Monolingual corpus acquired in five languages and two domains
BASE
Show details
5
Third version (v4) of the integrated platform and documentation
BASE
Show details
6
PANACEA (Platform for Automatic, Normalised Annotation and Cost-Effective Acquisition of Language Resources for Human Language Technologies)
Bel Rafecas, Núria; Poch, Marc; Toral, Antonio. - : European Association for Machine Translation
BASE
Show details
7
Initial functional prototype and documentation describing the initial CAA subsystem and its components
BASE
Show details
8
Second version (v2) of the integrated platform and documentation
BASE
Show details
9
Language Resources Factory: case study on the acquisition of Translation Memories
Toral, Antonio; Bel Rafecas, Núria; Poch, Marc. - : ACL (Association for Computational Linguistics)
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
9
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern