Menu

Liste der Korpora

Name Größe Beschreibung Sprache ELRA Details Ihre Auswahl
2006 CoNLL Shared Task - Ten Languages 85.2 Mb 2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006… Turkish; Bulgarian; Japanese; Danish; Por… ELRA-W0086 Details
2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish 45 Mb 2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as pa… Turkish; Czech; Catalan; Basque … ELRA-W0121 Details
2007 CoNLL Shared Task - Greek, Hungarian & Italian 18 Mb 2007 CoNLL Shared Task - Greek, Hungarian & Italian consists of dependency treebanks in three languages used as part of… Italian; Modern Greek (1453-); Hungarian … ELRA-W0122 Details
Al-Hayat Arabic Corpus 1.1 Gb The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open… Arabic ELRA-W0030 Details
Amaryllis Corpus - Evaluation Package 505 Mb Launched at the end of 1995, the AMARYLLIS project aimed at evaluating information retrieval software for French text c… French ELRA-W0029 Details
Amharic-English bilingual corpus 15 Mb The Amharic-English bilingual corpus contains parallel text from legal and news domains in Amharic script, in translite… English; Amharic ELRA-W0074 Details
An-Nahar Newspaper Text Corpus 794 Mb The An-Nahar Lebanon Newspaper Text Corpus comprises articles in standard Arabic from 1995 to 2000 (6 years) stored as … Arabic ELRA-W0027 Details
Arbobanko (Esperanto Treebank) 12 Mb The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news m… Esperanto ELRA-W0129 Details
Arboretum treebank 26 Mb The Arboretum treebank is a morphologically and syntactically annotated repository of Danish sentences, taken from Korp… Danish ELRA-W0084 Details
ARCADE/ROMANSEVAL corpus 63 Mb The ARCADE/ROMANSEVAL corpus was used as a reference corpus in two international competitions:· ARCADE, an exercise on … English; Italian; French ELRA-W0018 Details
A "scientific" corpus of modern French ("La Recherche" magazine) - Complete version 23 Mb This "scientific" corpus of modern French was produced by the University of Nantes (France) within the European Commiss… French ELRA-W0025-02 Details
Bilingual Bulgarian-English corpus from the 2018 Proposal for a National Climate Change Adaptation Strategy and Action Plan from the website of the Bulgarian Ministry of Environment and Water (Processed) 12 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0263 Details
Bilingual Bulgarian-English corpus from the National Revenue Agency (BG) (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0173 Details
Bilingual collection of documents about the Cyprus Problem (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0132 Details
Bilingual collection of reports of the Greek Public Power Corporation (Processed) 13 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0244 Details
Bilingual Croatian-English Parallel Corpus (Processed) 18 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0204 Details
Bilingual documents Bulgarian-English in the field of ICT and Transport (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0133 Details
Bilingual documents Bulgarian-English in the field of open data, broadband and information society (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0134 Details
Bilingual documents Bulgarian-English in the field of transport (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0161 Details
Bilingual hr-en parallel corpus from Croatian Mine Action website (Processed) 12 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0131 Details
Bilingual hr-en parallel corpus from Croatian National Bank website (Processed) 8 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0226 Details
Bilingual hr-en parallel corpus from the Journal of the Croatian Association of Civil Engineers website (Processed) 12 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0273 Details
Bilingual hr-en parallel corpus from the National and University Library in Zagreb website (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0135 Details
Bilingual resource with Bulgarian strategic documents in the field of innovations and digital growth (Bulgarian - English) (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0153 Details
Bilingual resource with Bulgarian strategic documents in the field of telecommunications and broadband (Bulgarian - English) (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0171 Details
BMI Brochures 2011-2015 (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; German ELRA-W0200 Details
BMI Brochures and Website 2016 (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; German ELRA-W0199 Details
BMVI Publications (Processed) 5 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; German ELRA-W0197 Details
BMVI Website (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; German ELRA-W0198 Details
Catalan Corpus of News Articles 645 Mb The Catalan Corpus of News Articles comprises articles in Catalan from 1 January 1999 to 31 March 2007. These articles … Catalan ELRA-W0047 Details
Catalan-Spanish Parallel Corpus 686 Mb This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de C… Spanish; Catalan ELRA-W0053 Details
Central Statistical Office Dataset (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0174 Details
Chinese-Vietnamese Parallel Corpus 74 Mb The Chinese-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sente… Chinese; Vietnamese ELRA-W0312 Details
CINTIL-DeepBank 213 Mb The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical r… Portuguese ELRA-W0062 Details
CINTIL-DependencyBank 1.4 Mb The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency g… Portuguese ELRA-W0061 Details
CINTIL-PropBank 3.6 Mb The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, compos… Portuguese ELRA-W0056 Details
CINTIL-TreeBank 3.1 Mb The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 11… Portuguese ELRA-W0055 Details
Civil Aviation Regulations (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0186 Details
Compendium The Social Insurance Institution (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0225 Details
Convention against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment - United Nations (French-English-Greek) (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; French; Modern Greek (1453-) … ELRA-W0309 Details
Convention on the transfer of sentenced persons (English - Greek) (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0196 Details
Corpus of Contemporaneous Spanish Novels 4.8 Mb This corpus consists of 11 novels written in Castilian Spanish by Inmaculada Ferrer-Vidal Turull, a contemporaneous aut… Spanish ELRA-W0041 Details
Corpus of Icelandic texts from the Central Bank of Iceland (Processed) 33 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Icelandic ELRA-W0298 Details
Corpus of State-related content from the Latvian Web (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Latvian ELRA-W0169 Details
Corpus on Finance and Economics from Bank of Latvia (Processed) 6 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Latvian ELRA-W0216 Details
CRATER 2 Corpus 359 Mb The CRATER corpus was built upon the foundations of an earlier project, ET10/63, which was funded in the final phase of… English; French; Spanish ELRA-W0033 Details
CRATER corpus 276 Mb The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-Frenc… English; French; Spanish ELRA-W0003 Details
Croatian-English corpus with Acts on Biological and Landscape Diversity and Environmental Protection (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0142 Details
Croatian-English corpus with statistical reports and studies from the Croatian Bureau of Statistics website (Processed) 9 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0264 Details
Croatian-English corpus with studies on the challenges to the Croatian Accession to the European Union from the Croatian Institute of Public Finance website (Processed) 9 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0266 Details
Croatian-English corpus with the Rural Development Programme for the Period 2014-2020 from the Croatian Rural Development Programme website (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0295 Details
Croatian-English parallel corpus from the website of the Croatian Journal of Fisheries (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0294 Details
Croatian-English parallel corpus from the website of the Embassy of Finland, Zagreb (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0292 Details
Croatian-English parallel corpus from the website of the Government Office for Cooperation with NGOs (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0291 Details
Croatian-English parallel corpus from the website of the Ministry of Foreign and European Affairs, Republic of Croatia (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0293 Details
DA-EN Danish Ministry of Higher Education and Science 2 (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Danish ELRA-W0157 Details
DA-EN Danish Ministry of Higher Education and Science 3 (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Danish ELRA-W0155 Details
DA-EN Danish Ministry of Higher Education and Science 4 (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Danish ELRA-W0172 Details
DA-EN Danish Ministry of Higher Education and Science (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Danish ELRA-W0166 Details
Danish Propbank 18 Mb The Danish Propbank (DPB) is a multi-layer treebank, annotated not only with morphosyntactic, but also with semantic in… Danish ELRA-W0117 Details
deL1L2IM corpus 2.8 Mb The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the… German ELRA-W0083 Details
Dutch PAROLE Distributable Corpus 70 Mb The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference c… Dutch ELRA-W0019 Details
ECI-ELSNET Italian & German tagged sub-corpus 3 Mb The objective is to provide a small but fine grained morphosyntactically tagged corpus, 50.000 running words for each o… Italian; German ELRA-W0005 Details
ECI/MCI (European Corpus Initiative/Multilingual Corpus I) 655 Mb The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual cor… Turkish; Bulgarian; English; Estonian; It… ELRA-W0004 Details
ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1 802 Mb The European Comparable and Parallel Corpora of Parliamentary Speeches Archive (ECPC), compiled at the Universitat Jaum… English; Spanish ELRA-W0128 Details
EJTN Handbook (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0163 Details
Ema-lon Manipuri Corpus (including word embedding and language model) The Ema-lon Manipuri Corpus consists of a set of resources for Manipuri language (locally known as Meiteilon) for the p… English; Manipuri ELRA-W0316 Details
Employment in Poland 2009 report in EN-PL (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0242 Details
English-Chinese-Vietnamese Trilingual Parallel Corpus 6 Mb The English-Chinese-Vietnamese Trilingual Parallel Corpus consists of 20,046 trilingual sets of sentence pairs. The cor… English; Chinese; Vietnamese … ELRA-W0314 Details
English - Croatian parallel corpus from texts of the Swedish Crime Victim Compensation and Support Authority (Brottsoffermyndigheten) web site (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0238 Details
English-Danish Parallel corpus from Tatoeba project (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Danish ELRA-W0214 Details
English-Estonian corpus from Finnish Information Bank (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Estonian ELRA-W0218 Details
English-Estonian Parallel corpus compiled from translated annual reports from Estonian Academy of Sciences 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Estonian ELRA-W0265 Details
English-Finnish corpus from Finnish Information Bank (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Finnish ELRA-W0217 Details
English-Icelandic parallel corpus from Statistics Iceland (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Icelandic ELRA-W0219 Details
English-Nepali Parallel Corpus 47 Mb The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali … English; Nepali (macrolanguage) … ELRA-W0077 Details
English-Norwegian parallel corpus from Forbruker Europa, 2017 release (Processed) 6 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Norwegian Bokmål ELRA-W0195 Details
English-Persian parallel corpus 287 Mb The English-Persian parallel corpus contains more than 200,000 aligned sentences across a variety of text types from th… English; Persian ELRA-W0118 Details
English-Persian parallel Corpus 40 Mb Please refer to ELRA-W0118 for the latest version of this corpus. This version consists of about 3,500,000 English and … English; Persian ELRA-W0051 Details
ENGLISH/POLISH PHRASE BOOK FOR ADMINISTRATIVE STAFF of LOCAL GOVERNMENT UNITS (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0227 Details
English-Slovak corpus of annual reports from the Slovak National Centre for Human Rights website (Processed) 5 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Slovak ELRA-W0137 Details
English-Slovak corpus of annual reports on immigration and asylum policies from the EMN National Contact Point for the Slovak Republic website (Processed) 6 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Slovak ELRA-W0136 Details
English-Slovak parallel corpus of texts from The Ministry of Culture of the Slovak Republic (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Slovak ELRA-W0188 Details
English-Slovak parallel corpus of texts from The Ministry of Justice of the Slovak Republic (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Slovak ELRA-W0189 Details
English-Swedish corpus from Finnish Information Bank (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0222 Details
English-Swedish parallel corpus from Annual Reports of the Swedish Pension System (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0268 Details
English - Swedish parallel corpus from texts of the Swedish Crime Victim Compensation and Support Authority (Brottsoffermyndigheten) web site (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0237 Details
English-Swedish parallel corpus from the Annual Overview of Sweden’s Official aid Agency SIDA Activities (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0269 Details
English-Swedish parallel corpus from the translation of 'Sweden a Pocket Guide' book (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0130 Details
English-Swedish parallel corpus from the web site of the Swedish Migration Board - Migrationsverket (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0239 Details
English-Swedish parallel texts from The Swedish Agency for Economic and Regional Growth - Tillväxtverket (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0240 Details
English-Vietnamese Parallel Corpus 166 Mb This is a corpus of 500,000 English-Vietnamese sentence pairs, built to develop SMT (Statistical Machine Translation) s… English; Vietnamese ELRA-W0124 Details
English-Vietnamese Parallel Corpus 397 Mb The English-Vietnamese Parallel Corpus consists of 1,000,000 sentence pairs, with an average length of 20 words per sen… English; Vietnamese ELRA-W0311 Details
EUIPO - IP case law French-English (Processed) 56 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; French ELRA-W0138 Details
EUIPO - IP case law German-English (Processed) 154 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; German ELRA-W0140 Details
EUIPO - IP case law Italian-English (Processed) 22 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Italian ELRA-W0141 Details
EUIPO - IP case law Spanish-English (Processed) 74 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Spanish ELRA-W0139 Details
EUIPO - list of goods and services French and English (Processed) 7 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; French ELRA-W0149 Details
EUIPO - list of goods and services German and English (Processed) 7 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; German ELRA-W0143 Details
EUIPO - list of goods and services German and French (Processed) 7 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… French; German ELRA-W0145 Details
EUIPO - list of goods and services German and Italian (Processed) 7 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Italian; German ELRA-W0146 Details
EUIPO - list of goods and services German and Spanish (Processed) 7 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… German; Spanish ELRA-W0144 Details
EUIPO - list of goods and services Italian and English (Processed) 8 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Italian ELRA-W0150 Details
EUIPO - list of goods and services Italian and French (Processed) 11 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Italian; French ELRA-W0152 Details
EUIPO - list of goods and services Italian and Spanish (Processed) 11 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Italian; Spanish ELRA-W0151 Details
EUIPO - list of goods and services Spanish and English (Processed) 8 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Spanish ELRA-W0147 Details
EUIPO - list of goods and services Spanish and French (Processed) 11 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… French; Spanish ELRA-W0148 Details
EUROPARL Corpus Parallel Corpora: Portuguese-English 2.3 Gb The EUROPARL Corpus (Portuguese-English subpart of the parallel corpora), was extracted from the proceedings of the Eur… English; Portuguese ELRA-W0090 Details
Expression of interest (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0209 Details
Financial Stability Reports from the National Bank of Poland (2013-14) (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0228 Details
Financial Stability Reports from the National Bank of Poland (2015-16) (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0229 Details
GeFRePaC - German French Reciprocal Parallel Corpus 1.3 Gb The German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Resear… French; German ELRA-W0031 Details
General Romanian-English bilingual corpus (Processed) 75 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Romanian ELRA-W0193 Details
Greek anti-corruption legislation and National Anti-Corruption Plan (greek-english) (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0164 Details
Greek-English parallel corpus from the website of the Prime Minister of the Hellenic Republic (Processed) 5 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0272 Details
Hallituskausi 2007-2011 -- Finnish-English Translation Memory (Processed) 23 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Finnish ELRA-W0220 Details
Hallituskausi 2011-2015 -- Finnish-English Translation Memory (Processed) 14 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Finnish ELRA-W0221 Details
Hellenic Ministry of Foreign Affairs Greek-English announcements corpus (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0271 Details
Helsinki Corpus of Swahili 1117 Mb This is a text corpus of Swahili language of 25 million words, annotated for part-of-speech, morphology and syntax. The… Swahili (macrolanguage) ELRA-W0119 Details
ICE-GB (British English component of the International Corpus of English) 97 Mb ICE-GB is the British component of the International Corpus of English (ICE). ICE began in 1990 with the primary aim of… English ELRA-W0021 Details
ILSP/ELEFTHEROTYPIA Corpus (Greek corpus) 27 Mb The ILSP/ELEFTHEROTYPIA Corpus contains approximately 3 million words classified and annotated according to the common … Modern Greek (1453-) ELRA-W0022 Details
International Agreements (Processed) 20 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Latvian ELRA-W0158 Details
Italian Syntactic-Semantic Treebank (ISST) 90 Mb ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in … Italian ELRA-W0044 Details
Karl May Korpus (KMK) 77 Mb The "Karl-May-Korpus" is a monolingual German corpus, available in an SGML-tagged ASCII text format. It contains the wo… German ELRA-W0016 Details
Khresmoi manually annotated reference corpus 1.3 Gb The Manually Annotated Reference Corpus is a collection of English web documents annotated with key entities (such as d… English ELRA-W0081 Details
Korean-Vietnamese Parallel Corpus 62 Mb The Korean-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per senten… Korean; Vietnamese ELRA-W0313 Details
Laws of Malta (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Maltese ELRA-W0234 Details
Legal texts from Estonian Ministry of Justice (Processed) 23 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Estonian ELRA-W0167 Details
"Le Monde Diplomatique" Arabic tagged corpus 59 Mb This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see al… Arabic ELRA-W0049 Details
"Le Monde Diplomatique" Text corpus in Arabic 57 Mb Electronic archiving of "Le Monde Diplomatique" articles in Arabic from 2000. The corpus is available in HTML. Each HTM… Arabic ELRA-W0036-04 Details
"Le Monde Diplomatique" Text corpus in English 28 Mb Electronic archiving of "Le Monde Diplomatique" articles in English from 1999. The corpus is available in HTML. Each HT… English ELRA-W0036-03 Details
"Le Monde Diplomatique" Text corpus in French - archives 1980-1998 233 Mb Electronic archiving of "Le Monde Diplomatique" articles in French from 1980 to 1998. The corpus is available in HTML. … French ELRA-W0036-01 Details
"Le Monde Diplomatique" Text corpus in French - archives from 1999 90 Mb Electronic archiving of "Le Monde Diplomatique" articles in French from 1999. The corpus is available in HTML. Each HTM… French ELRA-W0036-02 Details
Letter of rights for persons arrested and or detained (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English; French; Modern Greek … ELRA-W0308 Details
Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English; Italian; Dutch; Frenc… ELRA-W0301 Details
LT Corpus 43 Mb The LT Corpus is composed of 70 fiction texts from Portuguese renowned authors. The corpus contains 1,781,083 tokens. T… Portuguese ELRA-W0059 Details
Luxembourg Museum Websites (de-en) (Processed) 45 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; French; German ELRA-W0201 Details
Macroeconomic Developments (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0207 Details
Malta Government Gazette (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Maltese ELRA-W0233 Details
Maltese-English website parallel corpus (Processed) 10 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Maltese ELRA-W0232 Details
Memorandum for a ESM programme (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0210 Details
Methodological Reconciliation (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0208 Details
MLCC Multilingual and Parallel Corpora 915 Mb The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different langu… English; Italian; Dutch; French; German; … ELRA-W0023 Details
Modern French Corpus including Anaphors Tagging 13 Mb The corpus that includes the tagging of the anaphors was created by the CRISTAL-GRESEC (Stendhal-Grenoble 3 University,… French ELRA-W0032 Details
Monolingual documents from the Government of Lithuania (Processed) 10 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Lithuanian ELRA-W0299 Details
Monolingual Greek corpus 5.1 Mb Monolingual Greek corpus of 1 million words. The corpus consists of articles written in 1996 from the Greek daily newsp… Modern Greek (1453-) ELRA-W0014 Details
Monolingual Vietnamese Annotated Corpus 36 Mb The Monolingual Vietnamese Annotated Corpus consists of 100,000 sentences, manually annotated with word boundaries, POS… Vietnamese ELRA-W0310 Details
MTP Annotated German corpus - tagged version 35 Mb This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP… German ELRA-W0008-02 Details
MTP Annotated German corpus - untagged version 283 Mb This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP… German ELRA-W0008-01 Details
MULTEXT JOC Corpus 114 Mb This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-… English; Italian; French; German; Spanish… ELRA-W0017 Details
Multilingual Corpus 9.9 Mb Multilingual parallel corpus produced by Kaist Korterm containing 60 000 expressions in Korean, Chinese and English. English; Korean; Chinese ELRA-W0035 Details
National Health Fund Dataset (Processed) 5 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0178 Details
Natolin European Centre Dataset (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0176 Details
NE3L named entities Arabic corpus 3 Mb The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with nam… Arabic ELRA-W0078 Details
NE3L named entities Chinese corpus 4.8 Mb The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with nam… Chinese ELRA-W0079 Details
NE3L named entities Russian corpus 2.7 Mb The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with nam… Russian ELRA-W0080 Details
NEMLAR Written Corpus 136 Mb This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the sa… Arabic ELRA-W0042 Details
Nepali Monolingual written corpus 683 Mb The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali … Nepali (macrolanguage) ELRA-W0076 Details
Normalized Arabic Fragments for Inestimable Stemming (NAFIS) 1 Mb Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a c… Arabic ELRA-W0127 Details
NPChunks 412 Kb NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randoml… Portuguese ELRA-W0089 Details
NUM 5M Mongolian written corpus 65 Mb This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws.Th… Mongolian ELRA-W0120 Details
PANACEA English-French and English-Greek parallel corpus acquired for Environment domain 11 Mb The PANACEA English-French and English-Greek parallel corpus was acquired in the framework of the PANACEA project (Plat… English; French ELRA-W0057 Details
PANACEA English-French and English-Greek parallel corpus acquired for Labour Legislation domain 16 Mb The PANACEA English-French and English-Greek parallel corpus was acquired in the framework of the PANACEA project (Plat… English; Modern Greek (1453-) … ELRA-W0058 Details
PANACEA Environment English monolingual corpus 2.7 Gb The PANACEA Environment English monolingual corpus was acquired in the framework of the PANACEA project (Platform for A… English ELRA-W0063 Details
PANACEA Environment French monolingual corpus 2.1 Gb The PANACEA Environment French monolingual corpus was acquired in the framework of the PANACEA project (Platform for Au… French ELRA-W0065 Details
PANACEA Environment Greek monolingual corpus 2 Gb The PANACEA Environment Greek monolingual corpus was acquired in the framework of the PANACEA project (Platform for Aut… Modern Greek (1453-) ELRA-W0067 Details
PANACEA Environment Italian monolingual corpus 1.8 Gb The PANACEA Environment Italian monolingual corpus was acquired in the framework of the PANACEA project (Platform for A… Italian ELRA-W0069 Details
PANACEA Environment Spanish monolingual corpus 2.3 Gb The PANACEA Environment Spanish monolingual corpus was acquired in the framework of the PANACEA project (Platform for A… Spanish ELRA-W0071 Details
PANACEA Labour English monolingual corpus 1.6 Gb The PANACEA Labour English monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automa… English ELRA-W0064 Details
PANACEA Labour French monolingual corpus 2.5 Gb The PANACEA Labour French monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automat… French ELRA-W0066 Details
PANACEA Labour Greek monolingual corpus 1.4 Gb The PANACEA Labour Greek monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automati… Modern Greek (1453-) ELRA-W0068 Details
PANACEA Labour Italian monolingual corpus 2.4 Gb The PANACEA Labour Italian monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automa… Italian ELRA-W0070 Details
PANACEA Labour Spanish monolingual corpus 1.9 Gb The PANACEA Labour Spanish monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automa… Spanish ELRA-W0072 Details
Parallel corpus (Bulgarian - English) in the public administration domain (Processed) 9 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0211 Details
Parallel corpus (en-pl) from the Export Promotion Portal of Poland (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0247 Details
Parallel corpus from Bank of Estonia (Processed) 8 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Estonian ELRA-W0162 Details
Parallel corpus from Estonian Cabinet of Ministers (Processed) 7 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Estonian ELRA-W0154 Details
Parallel corpus from Estonian Ministry of Foreign Affairs (Processed) 12 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Estonian ELRA-W0168 Details
Parallel corpus from Parliament of Estonia (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Estonian ELRA-W0215 Details
Parallel corpus from Social Insurance Agency -- Försäkringskassan (Sweden) (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0213 Details
Parallel corpus from the website of the Chancellery of the Prime Minister of Poland (Processed) 6 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0249 Details
Parallel Corpus from the Web Site of the the MFA of Latvia (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Latvian ELRA-W0159 Details
Parallel corpus (Greek - English) in the law domain (Processed) (Part1) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0205 Details
Parallel corpus (Greek - English) in the public administration domain (Processed) 14 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0203 Details
Parallel corpus (Polish - English) from the website of the Polish Investment and Trade Agency (Processed) 8 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0212 Details
Parallel Global Voices (Bulgarian - English) (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English ELRA-W0297 Details
Parallel Global Voices (English - Polish) (Processed) 28 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0241 Details
Parallel Global Voices (Greek - English) (Processed) 43 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0202 Details
Parallel texts from Swedish Labour market agency. Part 2 (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish; French; German; Spanish… ELRA-W0300 Details
Parallel texts from Swedish Labour market agency (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish; French; German; Spanish… ELRA-W0302 Details
Parallel texts from Swedish National Food Agency (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish; French; Spanish; Polish… ELRA-W0305 Details
Parallel texts from Swedish Social Security Authority (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Italian; Swedish; French; German… ELRA-W0303 Details
Parallel texts from Swedish Work environment Authority (Processed) 7 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… Bulgarian; English; Estonian; Italian; Li… ELRA-W0304 Details
Parallel texts from the Swedish Competition Authority - Konkurrensverket (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0231 Details
PAROLE French Corpus 349 Mb The PAROLE French corpus contains the following data:Miscellaneous: Data provided by ELRA (CRATER, MLCC Multilingual an… French ELRA-W0020 Details
PAROLE Irish Distributable Corpus 25 Mb The PAROLE Irish Distributable Corpus consists of over 8 million words (a subset of the 15+ million words Irish Referen… Irish ELRA-W0026 Details
PAROLE Italian Corpus 44 Mb The PAROLE Italian Corpus comprises 3,135,651 words collected from four different domains: •newspapers: 2,179,800 words… Italian ELRA-W0043 Details
PAROLE Portuguese Corpus - complete version 57 Mb The parole Portuguese corpus contains approximately 3 million running words of European Portuguese distributed by Mediu… Portuguese ELRA-W0024-01 Details
Persian 1984 corpus (Multext-East framework) 5.9 Mb This corpus contains the Persian (Farsi) translation of a part of the novel “1984” (G. Orwell) annotated in the Multext… Persian ELRA-W0054 Details
Persian Ezafe Construction Dataset The Persian Ezafe Construction Dataset includes gold Ezafe tags in almost 30 thousand Persian sentences. The sentences … Persian ELRA-W0315 Details
PKN Orlen Dataset (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0175 Details
Polish-English parallel corpus from the website "Business in Poland" (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0274 Details
Polish-English parallel corpus from the website "geoportal.gov.pl" (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0285 Details
Polish-English parallel corpus from the website of Public Employment Services in Poland (member of EURES network) (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0259 Details
Polish-English parallel corpus from the website of the Central Statistical Office (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0279 Details
Polish-English parallel corpus from the website of the Citizens Information Board (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0251 Details
Polish-English parallel corpus from the website of the ING Polish Art Foundation (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0261 Details
Polish-English parallel corpus from the website of the Institute of Mathematics of the Polish Academy of Sciences (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0283 Details
Polish-English parallel corpus from the website of the Ministry of Agriculture and Rural Development (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0252 Details
Polish-English parallel corpus from the website of the Ministry of Culture and National Heritage (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0257 Details
Polish-English parallel corpus from the website of the Ministry of Development (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0253 Details
Polish-English parallel corpus from the website of the Ministry of Digital Affairs (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0284 Details
Polish-English parallel corpus from the website of the Ministry of Digitization (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0255 Details
Polish-English parallel corpus from the website of the Ministry of Foreign Affairs (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0256 Details
Polish-English parallel corpus from the website of the Ministry of Justice (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0254 Details
Polish-English parallel corpus from the website of the Ministry of National Defence (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0250 Details
Polish-English parallel corpus from the website of the Ministry of Regional Development (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0282 Details
Polish-English parallel corpus from the website of the Ministry of Science and Higher Education (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0286 Details
Polish-English parallel corpus from the website of the Ministry of the Interior and Administration (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0258 Details
Polish-English parallel corpus from the website of the National Audiovisual Institute (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0289 Details
Polish-English parallel corpus from the website of the National Centre for Nuclear Research (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0278 Details
Polish-English parallel corpus from the website of the National Centre for Research and Development (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0280 Details
Polish-English parallel corpus from the website of the National Digital Archives (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0290 Details
Polish-English parallel corpus from the website of the National Science Centre (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0260 Details
Polish-English parallel corpus from the website of the National Security Bureau (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0262 Details
Polish-English parallel corpus from the website of the Office of the Commissioner for Human Rights (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0281 Details
Polish-English parallel corpus from the website of the Polish Tourism Organisation (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0276 Details
Polish-English parallel corpus from the website of the State Marine Accident Investigation Commission (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0288 Details
Polish-English parallel corpus from the website of the U.S. EMBASSY and CONSULATE IN POLAND (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0277 Details
Polish-English parallel corpus from the website "Polish Aid" (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0275 Details
Polish-English parallel corpus from the website "Science in Poland" (Processed) 18 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0287 Details
Polish Food 4 & Food Policy Dataset (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0179 Details
Polish Food Dataset 2 (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0180 Details
Polish Food DataSet 3 (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0181 Details
Polish Food Dataset (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0177 Details
Polish Ministry of Foreign Affairs Historical Dataset (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0183 Details
Polish Ministry of Foreign Affairs Regional Dataset (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0182 Details
Polish Ministry of Foreign Affairs reports in EN and PL (Processed) 3 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0235 Details
Polish Ministry of Foreign Affairs Youth 2011 Report (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0184 Details
Portuguese-English bilingual corpus from Legislation concerning the Portuguese Parliament (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Portuguese ELRA-W0245 Details
Portuguese-English bilingual corpus from the Portuguese Constitution (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Portuguese ELRA-W0246 Details
PRESS 65 6.3 Mb Språkdata has made available the first of its many Swedish corpora, PRESS 65. It consists of one million running words … Swedish ELRA-W0010 Details
PTPARL Corpus 25 Mb The PTPARL Corpus contains 1,076 texts consisting of adapted transcriptions of the Portuguese Parliament sessions. The … Portuguese ELRA-W0060 Details
Public Procurement Dataset 1 (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0187 Details
Public Procurement Dataset 2 (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Polish ELRA-W0185 Details
Quaero Old Press Extended Named Entity corpus 6.8 Gb The Quaero Old Press Extended Named Entity corpus consists of the manual annotation of 76 newspaper issues published in… French ELRA-W0073 Details
Qualified POS Tagged Corpus 66 Mb Monolingual corpus in a .txt format, produced by KAIST KORTERM, containing 1020000 eojeols (Korean terms) in Korean. Th… Korean ELRA-W0034 Details
Quarterly Reports of the Parliamentary Budget Office (Hellenic Parliament) (Processed) 15 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Modern Greek (1453-) … ELRA-W0243 Details
ROCO Romanian journalistic corpus 729 Mb ROCO is a Romanian journalistic corpus containing approximately 7.1 million tokens, the number of types being 231,626. … Romanian ELRA-W0085 Details
Romanian-English corpus with studies, reports and statistical data in the field of culture from the National Institute for Cultural Research and Training website (Processed) 8 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Romanian ELRA-W0270 Details
Romanian - English literature corpus (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Romanian ELRA-W0192 Details
Romanian – English New Criminal Procedure Code (Processed) 4 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Romanian ELRA-W0170 Details
Romanian - English news corpus (Processed) 63 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Romanian ELRA-W0194 Details
Romanian Ombudsman archive (Processed) 5 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Romanian ELRA-W0206 Details
ROMBAC - Romanian balanced corpus 1.1 Gb ROMBAC is a Romanian corpus containing equal shares of texts from 5 different genres: journalism, legalese, fiction, me… Romanian ELRA-W0088 Details
Secretariat-General parallel corpus SL-EN and EN-SL (part 1) (Processed) 34 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Slovenian ELRA-W0190 Details
Secretariat-General parallel corpus SL-EN and EN-SL (part 2) (Processed) 39 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Slovenian ELRA-W0191 Details
SIP Publications (Processed) 7 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; French; German ELRA-W0306 Details
Slovenian-English corpus with statistical reports from the Statistical Office of the Republic of Slovenia website (Processed) 9 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Slovenian ELRA-W0267 Details
Spanish-English website parallel corpus (Processed) 9 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Spanish ELRA-W0248 Details
Tagged text in French (MEMODATA) with rules of morphological disambiguation 3.1 Gb More than 170 books (classical novels, legal texts...) are tagged with rules of morphological disambiguation. A tagged … French ELRA-W0012 Details
Tagged text in French (MEMODATA) with typographic tags 247 Mb More than 170 books (classical novels, legal texts...) are tagged with typographic tags. A tagged corpus of 50 books is… French ELRA-W0011 Details
Text corpus of "Le Monde" 3.9 Gb Electronic archiving of "Le Monde" articles started on 1 January 1987. Some 200 articles are added every day, and as of… French ELRA-W0015 Details
The CINTIL Corpus – International Corpus of Portuguese 20 Mb CINTIL-Corpus Internacional do Português is a linguistically interpreted written and spoken corpus of European Portugue… Portuguese ELRA-W0050 Details
The Coimisineir Teanga Bilingual Corpus of Reference Documents (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Irish ELRA-W0224 Details
The Coimisineir Teanga Bilingual Corpus of Reports and Press Releases (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Irish ELRA-W0230 Details
The Croatian-English corpus with the nature protection strategy of Croatia (Processed) 1 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Croatian ELRA-W0296 Details
The EMILLE/CIIL Corpus 1.5 Gb The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen mo… Urdu; Telugu; Tamil; Marathi; Malayalam; … ELRA-W0037 Details
The Gaois bilingual corpus of English-Irish legislation (Processed) 26 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Irish ELRA-W0223 Details
The Lancaster Corpus of Mandarin Chinese (LCMC) 45 Mb The Lancaster Corpus of Mandarin Chinese (LCMC) is designed as a Chinese match for the FLOB and FROWN corpora for moder… Chinese ELRA-W0039 Details
TRAD Arabic-English Mailing lists Parallel corpus - Development set 2 Mb This is a parallel corpus of 10,000 words in Arabic and a reference translation in English. The source texts are emails… English; Arabic ELRA-W0108 Details
TRAD Arabic-English Mailing lists Parallel corpus - Test set 2 Mb This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are email… English; Arabic ELRA-W0106 Details
TRAD Arabic-English Newspaper Parallel corpus - Test set 1 2 Mb This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are artic… English; Arabic ELRA-W0099 Details
TRAD Arabic-English Parallel corpus of transcribed Broadcast News Speech 2 Mb This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are trans… English; Arabic ELRA-W0102 Details
TRAD Arabic-English Web domain (blogs) Parallel corpus 2 Mb This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are blog … English; Arabic ELRA-W0104 Details
TRAD Arabic-French Mailing lists Parallel corpus - Development set 1 Mb This is a parallel corpus of 10,000 words in Arabic and a reference translation in French. The source texts are emails … Arabic; French ELRA-W0107 Details
TRAD Arabic-French Mailing lists Parallel corpus - Test set 2 Mb This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are emails… Arabic; French ELRA-W0105 Details
TRAD Arabic-French Newspaper Parallel corpus - Test set 1 2 Mb This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are articl… Arabic; French ELRA-W0098 Details
TRAD Arabic-French Newspaper Parallel corpus - Test set 2 2 Mb This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in French. The source texts are articl… Arabic; French ELRA-W0100 Details
TRAD Arabic-French Parallel corpus of transcribed Broadcast News Speech 2 Mb This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are transc… Arabic; French ELRA-W0101 Details
TRAD Arabic-French Web domain (blogs) Parallel corpus 2 Mb This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are blog a… Arabic; French ELRA-W0103 Details
TRAD Chinese-English Email Parallel corpus – Development Set 1 Mb This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and a reference translation in E… English; Chinese ELRA-W0113 Details
TRAD Chinese-English Email Parallel corpus – Test Set 1 Mb This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … English; Chinese ELRA-W0115 Details
TRAD Chinese-English News Articles Parallel corpus 1 Mb This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … English; Chinese ELRA-W0112 Details
TRAD Chinese-English Web domain (blogs) Parallel corpus 1 Mb This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … English; Chinese ELRA-W0110 Details
TRAD Chinese-French Email Parallel corpus – Development Set 2 Mb This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and a reference translation in F… Chinese; French ELRA-W0114 Details
TRAD Chinese-French Email Parallel corpus – Test Set 2 Mb This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … Chinese; French ELRA-W0116 Details
TRAD Chinese-French News Articles Parallel corpus 2 Mb This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … Chinese; French ELRA-W0111 Details
TRAD Chinese-French Web domain (blogs) Parallel corpus 2 Mb This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … Chinese; French ELRA-W0109 Details
TRAD Pashto-English News Articles Parallel corpus 602 Kb This is a parallel corpus, which contains 10,000 Pashto words translated into English by two different translators. The… English; Pushto ELRA-W0097 Details
TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data 575 Kb This is a parallel corpus, which contains 10,000 Pashto words translated into English. The source texts come from 3 bro… English; Pushto ELRA-W0095 Details
TRAD Pashto-French News Articles Parallel corpus 970 Kb This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The … Pushto; French ELRA-W0096 Details
TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data 29 Mb This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The … Pushto; French ELRA-W0094 Details
TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data 473 Mb The corpus consists of the transcription of 106 hours of recordings in Pashto translated into French. The transcription… Pushto; French ELRA-W0093 Details
TRAD Pashto Monolingual text Corpus 2.2 Gb This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different b… Pushto ELRA-W0092 Details
Training and test data for Arabizi detection and transliteration 1 Mb The dataset is composed of two distinct resources:1) A collection of mixed English and Arabizi text intended to train a… English; Arabic ELRA-W0126 Details
Translation memories from The Ministry of Foreign Affairs of Norway (Processed) 620 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Norwegian ELRA-W0156 Details
Translation memory from Swedish National Audit Office (NAO) - Riksrevisionen (Processed) 12 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Swedish ELRA-W0236 Details
Translations of Lithuanian legislation from Seimas of the Republic of Lithuania (Processed) 70 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Lithuanian ELRA-W0165 Details
Trilingual Documents related to International Judicial Cooperation in Civil Matters (Greek-English-French) (Processed) 2 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; French; Modern Greek (1453-) … ELRA-W0307 Details
TSNLP (Test Suites for NLP Testing) 4.5 Mb The TSNLP project (LRE 62-089) has produced a database of test suites for English, French and German containing over 4,… English; French; German ELRA-W0013 Details
Venice Italian Treebank (VIT) 149 Mb The VIT, Venice Italian Treebank is the effort of the collaboration of people working at the Laboratory of Computationa… Italian ELRA-W0040 Details
Website of the President of the Republic of Lithuania (Processed) 7 Mb This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… English; Lithuanian ELRA-W0160 Details
Wolverhampton Business English Corpus 118 Mb The WBE was created by the Computational Linguistics Group at University of Wolverhampton through a funding from ELRA i… English ELRA-W0028 Details
Name Größe Beschreibung Sprache ELRA Details Ihre Auswahl