List of Corpora

Name	Size	Description	Language	ELRA	Details	Your selection
2006 CoNLL Shared Task - Ten Languages	85.2 Mb	2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006…	…	ELRA-W0086	Details
2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish	45 Mb	2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as pa…	…	ELRA-W0121	Details
2007 CoNLL Shared Task - Greek, Hungarian & Italian	18 Mb	2007 CoNLL Shared Task - Greek, Hungarian & Italian consists of dependency treebanks in three languages used as part of…	…	ELRA-W0122	Details
Al-Hayat Arabic Corpus	1.1 Gb	The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open…	…	ELRA-W0030	Details
Amaryllis Corpus - Evaluation Package	505 Mb	Launched at the end of 1995, the AMARYLLIS project aimed at evaluating information retrieval software for French text c…	…	ELRA-W0029	Details
Amharic-English bilingual corpus	15 Mb	The Amharic-English bilingual corpus contains parallel text from legal and news domains in Amharic script, in translite…	…	ELRA-W0074	Details
An-Nahar Newspaper Text Corpus	794 Mb	The An-Nahar Lebanon Newspaper Text Corpus comprises articles in standard Arabic from 1995 to 2000 (6 years) stored as …	…	ELRA-W0027	Details
Arbobanko (Esperanto Treebank)	12 Mb	The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news m…	…	ELRA-W0129	Details
Arboretum treebank	26 Mb	The Arboretum treebank is a morphologically and syntactically annotated repository of Danish sentences, taken from Korp…	…	ELRA-W0084	Details
ARCADE/ROMANSEVAL corpus	63 Mb	The ARCADE/ROMANSEVAL corpus was used as a reference corpus in two international competitions:· ARCADE, an exercise on …	…	ELRA-W0018	Details
A "scientific" corpus of modern French ("La Recherche" magazine) - Complete version	23 Mb	This "scientific" corpus of modern French was produced by the University of Nantes (France) within the European Commiss…	…	ELRA-W0025-02	Details
Bilingual Bulgarian-English corpus from the 2018 Proposal for a National Climate Change Adaptation Strategy and Action Plan from the website of the Bulgarian Ministry of Environment and Water (Processed)	12 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0263	Details
Bilingual Bulgarian-English corpus from the National Revenue Agency (BG) (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0173	Details
Bilingual collection of documents about the Cyprus Problem (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0132	Details
Bilingual collection of reports of the Greek Public Power Corporation (Processed)	13 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0244	Details
Bilingual Croatian-English Parallel Corpus (Processed)	18 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0204	Details
Bilingual documents Bulgarian-English in the field of ICT and Transport (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0133	Details
Bilingual documents Bulgarian-English in the field of open data, broadband and information society (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0134	Details
Bilingual documents Bulgarian-English in the field of transport (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0161	Details
Bilingual hr-en parallel corpus from Croatian Mine Action website (Processed)	12 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0131	Details
Bilingual hr-en parallel corpus from Croatian National Bank website (Processed)	8 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0226	Details
Bilingual hr-en parallel corpus from the Journal of the Croatian Association of Civil Engineers website (Processed)	12 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0273	Details
Bilingual hr-en parallel corpus from the National and University Library in Zagreb website (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0135	Details
Bilingual resource with Bulgarian strategic documents in the field of innovations and digital growth (Bulgarian - English) (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0153	Details
Bilingual resource with Bulgarian strategic documents in the field of telecommunications and broadband (Bulgarian - English) (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0171	Details
BMI Brochures 2011-2015 (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0200	Details
BMI Brochures and Website 2016 (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0199	Details
BMVI Publications (Processed)	5 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0197	Details
BMVI Website (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0198	Details
Catalan Corpus of News Articles	645 Mb	The Catalan Corpus of News Articles comprises articles in Catalan from 1 January 1999 to 31 March 2007. These articles …	…	ELRA-W0047	Details
Catalan-Spanish Parallel Corpus	686 Mb	This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de C…	…	ELRA-W0053	Details
Central Statistical Office Dataset (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0174	Details
Chinese-Vietnamese Parallel Corpus	74 Mb	The Chinese-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sente…	…	ELRA-W0312	Details
CINTIL-DeepBank	213 Mb	The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical r…	…	ELRA-W0062	Details
CINTIL-DependencyBank	1.4 Mb	The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency g…	…	ELRA-W0061	Details
CINTIL-PropBank	3.6 Mb	The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, compos…	…	ELRA-W0056	Details
CINTIL-TreeBank	3.1 Mb	The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 11…	…	ELRA-W0055	Details
Civil Aviation Regulations (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0186	Details
Compendium The Social Insurance Institution (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0225	Details
Convention against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment - United Nations (French-English-Greek) (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0309	Details
Convention on the transfer of sentenced persons (English - Greek) (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0196	Details
Corpus of Contemporaneous Spanish Novels	4.8 Mb	This corpus consists of 11 novels written in Castilian Spanish by Inmaculada Ferrer-Vidal Turull, a contemporaneous aut…	…	ELRA-W0041	Details
Corpus of Icelandic texts from the Central Bank of Iceland (Processed)	33 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0298	Details
Corpus of State-related content from the Latvian Web (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0169	Details
Corpus on Finance and Economics from Bank of Latvia (Processed)	6 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0216	Details
CRATER 2 Corpus	359 Mb	The CRATER corpus was built upon the foundations of an earlier project, ET10/63, which was funded in the final phase of…	…	ELRA-W0033	Details
CRATER corpus	276 Mb	The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-Frenc…	…	ELRA-W0003	Details
Croatian-English corpus with Acts on Biological and Landscape Diversity and Environmental Protection (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0142	Details
Croatian-English corpus with statistical reports and studies from the Croatian Bureau of Statistics website (Processed)	9 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0264	Details
Croatian-English corpus with studies on the challenges to the Croatian Accession to the European Union from the Croatian Institute of Public Finance website (Processed)	9 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0266	Details
Croatian-English corpus with the Rural Development Programme for the Period 2014-2020 from the Croatian Rural Development Programme website (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0295	Details
Croatian-English parallel corpus from the website of the Croatian Journal of Fisheries (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0294	Details
Croatian-English parallel corpus from the website of the Embassy of Finland, Zagreb (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0292	Details
Croatian-English parallel corpus from the website of the Government Office for Cooperation with NGOs (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0291	Details
Croatian-English parallel corpus from the website of the Ministry of Foreign and European Affairs, Republic of Croatia (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0293	Details
DA-EN Danish Ministry of Higher Education and Science 2 (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0157	Details
DA-EN Danish Ministry of Higher Education and Science 3 (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0155	Details
DA-EN Danish Ministry of Higher Education and Science 4 (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0172	Details
DA-EN Danish Ministry of Higher Education and Science (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0166	Details
Danish Propbank	18 Mb	The Danish Propbank (DPB) is a multi-layer treebank, annotated not only with morphosyntactic, but also with semantic in…	…	ELRA-W0117	Details
deL1L2IM corpus	2.8 Mb	The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the…	…	ELRA-W0083	Details
Dutch PAROLE Distributable Corpus	70 Mb	The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference c…	…	ELRA-W0019	Details
ECI-ELSNET Italian & German tagged sub-corpus	3 Mb	The objective is to provide a small but fine grained morphosyntactically tagged corpus, 50.000 running words for each o…	…	ELRA-W0005	Details
ECI/MCI (European Corpus Initiative/Multilingual Corpus I)	655 Mb	The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual cor…	…	ELRA-W0004	Details
ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1	802 Mb	The European Comparable and Parallel Corpora of Parliamentary Speeches Archive (ECPC), compiled at the Universitat Jaum…	…	ELRA-W0128	Details
EJTN Handbook (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0163	Details
Ema-lon Manipuri Corpus (including word embedding and language model)	–	The Ema-lon Manipuri Corpus consists of a set of resources for Manipuri language (locally known as Meiteilon) for the p…	…	ELRA-W0316	Details
Employment in Poland 2009 report in EN-PL (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0242	Details
English-Chinese-Vietnamese Trilingual Parallel Corpus	6 Mb	The English-Chinese-Vietnamese Trilingual Parallel Corpus consists of 20,046 trilingual sets of sentence pairs. The cor…	…	ELRA-W0314	Details
English - Croatian parallel corpus from texts of the Swedish Crime Victim Compensation and Support Authority (Brottsoffermyndigheten) web site (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0238	Details
English-Danish Parallel corpus from Tatoeba project (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0214	Details
English-Estonian corpus from Finnish Information Bank (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0218	Details
English-Estonian Parallel corpus compiled from translated annual reports from Estonian Academy of Sciences	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0265	Details
English-Finnish corpus from Finnish Information Bank (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0217	Details
English-Icelandic parallel corpus from Statistics Iceland (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0219	Details
English-Nepali Parallel Corpus	47 Mb	The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali …	…	ELRA-W0077	Details
English-Norwegian parallel corpus from Forbruker Europa, 2017 release (Processed)	6 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0195	Details
English-Persian parallel corpus	287 Mb	The English-Persian parallel corpus contains more than 200,000 aligned sentences across a variety of text types from th…	…	ELRA-W0118	Details
English-Persian parallel Corpus	40 Mb	Please refer to ELRA-W0118 for the latest version of this corpus. This version consists of about 3,500,000 English and …	…	ELRA-W0051	Details
ENGLISH/POLISH PHRASE BOOK FOR ADMINISTRATIVE STAFF of LOCAL GOVERNMENT UNITS (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0227	Details
English-Slovak corpus of annual reports from the Slovak National Centre for Human Rights website (Processed)	5 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0137	Details
English-Slovak corpus of annual reports on immigration and asylum policies from the EMN National Contact Point for the Slovak Republic website (Processed)	6 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0136	Details
English-Slovak parallel corpus of texts from The Ministry of Culture of the Slovak Republic (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0188	Details
English-Slovak parallel corpus of texts from The Ministry of Justice of the Slovak Republic (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0189	Details
English-Swedish corpus from Finnish Information Bank (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0222	Details
English-Swedish parallel corpus from Annual Reports of the Swedish Pension System (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0268	Details
English - Swedish parallel corpus from texts of the Swedish Crime Victim Compensation and Support Authority (Brottsoffermyndigheten) web site (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0237	Details
English-Swedish parallel corpus from the Annual Overview of Sweden’s Official aid Agency SIDA Activities (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0269	Details
English-Swedish parallel corpus from the translation of 'Sweden a Pocket Guide' book (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0130	Details
English-Swedish parallel corpus from the web site of the Swedish Migration Board - Migrationsverket (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0239	Details
English-Swedish parallel texts from The Swedish Agency for Economic and Regional Growth - Tillväxtverket (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0240	Details
English-Vietnamese Parallel Corpus	166 Mb	This is a corpus of 500,000 English-Vietnamese sentence pairs, built to develop SMT (Statistical Machine Translation) s…	…	ELRA-W0124	Details
English-Vietnamese Parallel Corpus	397 Mb	The English-Vietnamese Parallel Corpus consists of 1,000,000 sentence pairs, with an average length of 20 words per sen…	…	ELRA-W0311	Details
EUIPO - IP case law French-English (Processed)	56 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0138	Details
EUIPO - IP case law German-English (Processed)	154 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0140	Details
EUIPO - IP case law Italian-English (Processed)	22 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0141	Details
EUIPO - IP case law Spanish-English (Processed)	74 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0139	Details
EUIPO - list of goods and services French and English (Processed)	7 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0149	Details
EUIPO - list of goods and services German and English (Processed)	7 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0143	Details
EUIPO - list of goods and services German and French (Processed)	7 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0145	Details
EUIPO - list of goods and services German and Italian (Processed)	7 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0146	Details
EUIPO - list of goods and services German and Spanish (Processed)	7 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0144	Details
EUIPO - list of goods and services Italian and English (Processed)	8 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0150	Details
EUIPO - list of goods and services Italian and French (Processed)	11 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0152	Details
EUIPO - list of goods and services Italian and Spanish (Processed)	11 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0151	Details
EUIPO - list of goods and services Spanish and English (Processed)	8 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0147	Details
EUIPO - list of goods and services Spanish and French (Processed)	11 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0148	Details
EUROPARL Corpus Parallel Corpora: Portuguese-English	2.3 Gb	The EUROPARL Corpus (Portuguese-English subpart of the parallel corpora), was extracted from the proceedings of the Eur…	…	ELRA-W0090	Details
Expression of interest (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0209	Details
Financial Stability Reports from the National Bank of Poland (2013-14) (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0228	Details
Financial Stability Reports from the National Bank of Poland (2015-16) (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0229	Details
GeFRePaC - German French Reciprocal Parallel Corpus	1.3 Gb	The German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Resear…	…	ELRA-W0031	Details
General Romanian-English bilingual corpus (Processed)	75 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0193	Details
Greek anti-corruption legislation and National Anti-Corruption Plan (greek-english) (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0164	Details
Greek-English parallel corpus from the website of the Prime Minister of the Hellenic Republic (Processed)	5 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0272	Details
Hallituskausi 2007-2011 -- Finnish-English Translation Memory (Processed)	23 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0220	Details
Hallituskausi 2011-2015 -- Finnish-English Translation Memory (Processed)	14 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0221	Details
Hellenic Ministry of Foreign Affairs Greek-English announcements corpus (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0271	Details
Helsinki Corpus of Swahili	1117 Mb	This is a text corpus of Swahili language of 25 million words, annotated for part-of-speech, morphology and syntax. The…	…	ELRA-W0119	Details
ICE-GB (British English component of the International Corpus of English)	97 Mb	ICE-GB is the British component of the International Corpus of English (ICE). ICE began in 1990 with the primary aim of…	…	ELRA-W0021	Details
ILSP/ELEFTHEROTYPIA Corpus (Greek corpus)	27 Mb	The ILSP/ELEFTHEROTYPIA Corpus contains approximately 3 million words classified and annotated according to the common …	…	ELRA-W0022	Details
International Agreements (Processed)	20 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0158	Details
Italian Syntactic-Semantic Treebank (ISST)	90 Mb	ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in …	…	ELRA-W0044	Details
Karl May Korpus (KMK)	77 Mb	The "Karl-May-Korpus" is a monolingual German corpus, available in an SGML-tagged ASCII text format. It contains the wo…	…	ELRA-W0016	Details
Khresmoi manually annotated reference corpus	1.3 Gb	The Manually Annotated Reference Corpus is a collection of English web documents annotated with key entities (such as d…	…	ELRA-W0081	Details
Korean-Vietnamese Parallel Corpus	62 Mb	The Korean-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per senten…	…	ELRA-W0313	Details
Laws of Malta (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0234	Details
Legal texts from Estonian Ministry of Justice (Processed)	23 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0167	Details
"Le Monde Diplomatique" Arabic tagged corpus	59 Mb	This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see al…	…	ELRA-W0049	Details
"Le Monde Diplomatique" Text corpus in Arabic	57 Mb	Electronic archiving of "Le Monde Diplomatique" articles in Arabic from 2000. The corpus is available in HTML. Each HTM…	…	ELRA-W0036-04	Details
"Le Monde Diplomatique" Text corpus in English	28 Mb	Electronic archiving of "Le Monde Diplomatique" articles in English from 1999. The corpus is available in HTML. Each HT…	…	ELRA-W0036-03	Details
"Le Monde Diplomatique" Text corpus in French - archives 1980-1998	233 Mb	Electronic archiving of "Le Monde Diplomatique" articles in French from 1980 to 1998. The corpus is available in HTML. …	…	ELRA-W0036-01	Details
"Le Monde Diplomatique" Text corpus in French - archives from 1999	90 Mb	Electronic archiving of "Le Monde Diplomatique" articles in French from 1999. The corpus is available in HTML. Each HTM…	…	ELRA-W0036-02	Details
Letter of rights for persons arrested and or detained (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0308	Details
Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0301	Details
LT Corpus	43 Mb	The LT Corpus is composed of 70 fiction texts from Portuguese renowned authors. The corpus contains 1,781,083 tokens. T…	…	ELRA-W0059	Details
Luxembourg Museum Websites (de-en) (Processed)	45 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0201	Details
Macroeconomic Developments (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0207	Details
Malta Government Gazette (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0233	Details
Maltese-English website parallel corpus (Processed)	10 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0232	Details
Memorandum for a ESM programme (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0210	Details
Methodological Reconciliation (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0208	Details
MLCC Multilingual and Parallel Corpora	915 Mb	The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different langu…	…	ELRA-W0023	Details
Modern French Corpus including Anaphors Tagging	13 Mb	The corpus that includes the tagging of the anaphors was created by the CRISTAL-GRESEC (Stendhal-Grenoble 3 University,…	…	ELRA-W0032	Details
Monolingual documents from the Government of Lithuania (Processed)	10 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0299	Details
Monolingual Greek corpus	5.1 Mb	Monolingual Greek corpus of 1 million words. The corpus consists of articles written in 1996 from the Greek daily newsp…	…	ELRA-W0014	Details
Monolingual Vietnamese Annotated Corpus	36 Mb	The Monolingual Vietnamese Annotated Corpus consists of 100,000 sentences, manually annotated with word boundaries, POS…	…	ELRA-W0310	Details
MTP Annotated German corpus - tagged version	35 Mb	This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP…	…	ELRA-W0008-02	Details
MTP Annotated German corpus - untagged version	283 Mb	This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP…	…	ELRA-W0008-01	Details
MULTEXT JOC Corpus	114 Mb	This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-…	…	ELRA-W0017	Details
Multilingual Corpus	9.9 Mb	Multilingual parallel corpus produced by Kaist Korterm containing 60 000 expressions in Korean, Chinese and English.	…	ELRA-W0035	Details
National Health Fund Dataset (Processed)	5 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0178	Details
Natolin European Centre Dataset (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0176	Details
NE3L named entities Arabic corpus	3 Mb	The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with nam…	…	ELRA-W0078	Details
NE3L named entities Chinese corpus	4.8 Mb	The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with nam…	…	ELRA-W0079	Details
NE3L named entities Russian corpus	2.7 Mb	The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with nam…	…	ELRA-W0080	Details
NEMLAR Written Corpus	136 Mb	This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the sa…	…	ELRA-W0042	Details
Nepali Monolingual written corpus	683 Mb	The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali …	…	ELRA-W0076	Details
Normalized Arabic Fragments for Inestimable Stemming (NAFIS)	1 Mb	Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a c…	…	ELRA-W0127	Details
NPChunks	412 Kb	NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randoml…	…	ELRA-W0089	Details
NUM 5M Mongolian written corpus	65 Mb	This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws.Th…	…	ELRA-W0120	Details
PANACEA English-French and English-Greek parallel corpus acquired for Environment domain	11 Mb	The PANACEA English-French and English-Greek parallel corpus was acquired in the framework of the PANACEA project (Plat…	…	ELRA-W0057	Details
PANACEA English-French and English-Greek parallel corpus acquired for Labour Legislation domain	16 Mb	The PANACEA English-French and English-Greek parallel corpus was acquired in the framework of the PANACEA project (Plat…	…	ELRA-W0058	Details
PANACEA Environment English monolingual corpus	2.7 Gb	The PANACEA Environment English monolingual corpus was acquired in the framework of the PANACEA project (Platform for A…	…	ELRA-W0063	Details
PANACEA Environment French monolingual corpus	2.1 Gb	The PANACEA Environment French monolingual corpus was acquired in the framework of the PANACEA project (Platform for Au…	…	ELRA-W0065	Details
PANACEA Environment Greek monolingual corpus	2 Gb	The PANACEA Environment Greek monolingual corpus was acquired in the framework of the PANACEA project (Platform for Aut…	…	ELRA-W0067	Details
PANACEA Environment Italian monolingual corpus	1.8 Gb	The PANACEA Environment Italian monolingual corpus was acquired in the framework of the PANACEA project (Platform for A…	…	ELRA-W0069	Details
PANACEA Environment Spanish monolingual corpus	2.3 Gb	The PANACEA Environment Spanish monolingual corpus was acquired in the framework of the PANACEA project (Platform for A…	…	ELRA-W0071	Details
PANACEA Labour English monolingual corpus	1.6 Gb	The PANACEA Labour English monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automa…	…	ELRA-W0064	Details
PANACEA Labour French monolingual corpus	2.5 Gb	The PANACEA Labour French monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automat…	…	ELRA-W0066	Details
PANACEA Labour Greek monolingual corpus	1.4 Gb	The PANACEA Labour Greek monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automati…	…	ELRA-W0068	Details
PANACEA Labour Italian monolingual corpus	2.4 Gb	The PANACEA Labour Italian monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automa…	…	ELRA-W0070	Details
PANACEA Labour Spanish monolingual corpus	1.9 Gb	The PANACEA Labour Spanish monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automa…	…	ELRA-W0072	Details
Parallel corpus (Bulgarian - English) in the public administration domain (Processed)	9 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0211	Details
Parallel corpus (en-pl) from the Export Promotion Portal of Poland (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0247	Details
Parallel corpus from Bank of Estonia (Processed)	8 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0162	Details
Parallel corpus from Estonian Cabinet of Ministers (Processed)	7 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0154	Details
Parallel corpus from Estonian Ministry of Foreign Affairs (Processed)	12 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0168	Details
Parallel corpus from Parliament of Estonia (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0215	Details
Parallel corpus from Social Insurance Agency -- Försäkringskassan (Sweden) (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0213	Details
Parallel corpus from the website of the Chancellery of the Prime Minister of Poland (Processed)	6 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0249	Details
Parallel Corpus from the Web Site of the the MFA of Latvia (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0159	Details
Parallel corpus (Greek - English) in the law domain (Processed) (Part1)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0205	Details
Parallel corpus (Greek - English) in the public administration domain (Processed)	14 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0203	Details
Parallel corpus (Polish - English) from the website of the Polish Investment and Trade Agency (Processed)	8 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0212	Details
Parallel Global Voices (Bulgarian - English) (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0297	Details
Parallel Global Voices (English - Polish) (Processed)	28 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0241	Details
Parallel Global Voices (Greek - English) (Processed)	43 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0202	Details
Parallel texts from Swedish Labour market agency. Part 2 (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0300	Details
Parallel texts from Swedish Labour market agency (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0302	Details
Parallel texts from Swedish National Food Agency (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0305	Details
Parallel texts from Swedish Social Security Authority (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0303	Details
Parallel texts from Swedish Work environment Authority (Processed)	7 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0304	Details
Parallel texts from the Swedish Competition Authority - Konkurrensverket (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0231	Details
PAROLE French Corpus	349 Mb	The PAROLE French corpus contains the following data:Miscellaneous: Data provided by ELRA (CRATER, MLCC Multilingual an…	…	ELRA-W0020	Details
PAROLE Irish Distributable Corpus	25 Mb	The PAROLE Irish Distributable Corpus consists of over 8 million words (a subset of the 15+ million words Irish Referen…	…	ELRA-W0026	Details
PAROLE Italian Corpus	44 Mb	The PAROLE Italian Corpus comprises 3,135,651 words collected from four different domains: •newspapers: 2,179,800 words…	…	ELRA-W0043	Details
PAROLE Portuguese Corpus - complete version	57 Mb	The parole Portuguese corpus contains approximately 3 million running words of European Portuguese distributed by Mediu…	…	ELRA-W0024-01	Details
Persian 1984 corpus (Multext-East framework)	5.9 Mb	This corpus contains the Persian (Farsi) translation of a part of the novel “1984” (G. Orwell) annotated in the Multext…	…	ELRA-W0054	Details
Persian Ezafe Construction Dataset	–	The Persian Ezafe Construction Dataset includes gold Ezafe tags in almost 30 thousand Persian sentences. The sentences …	…	ELRA-W0315	Details
PKN Orlen Dataset (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0175	Details
Polish-English parallel corpus from the website "Business in Poland" (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0274	Details
Polish-English parallel corpus from the website "geoportal.gov.pl" (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0285	Details
Polish-English parallel corpus from the website of Public Employment Services in Poland (member of EURES network) (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0259	Details
Polish-English parallel corpus from the website of the Central Statistical Office (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0279	Details
Polish-English parallel corpus from the website of the Citizens Information Board (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0251	Details
Polish-English parallel corpus from the website of the ING Polish Art Foundation (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0261	Details
Polish-English parallel corpus from the website of the Institute of Mathematics of the Polish Academy of Sciences (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0283	Details
Polish-English parallel corpus from the website of the Ministry of Agriculture and Rural Development (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0252	Details
Polish-English parallel corpus from the website of the Ministry of Culture and National Heritage (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0257	Details
Polish-English parallel corpus from the website of the Ministry of Development (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0253	Details
Polish-English parallel corpus from the website of the Ministry of Digital Affairs (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0284	Details
Polish-English parallel corpus from the website of the Ministry of Digitization (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0255	Details
Polish-English parallel corpus from the website of the Ministry of Foreign Affairs (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0256	Details
Polish-English parallel corpus from the website of the Ministry of Justice (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0254	Details
Polish-English parallel corpus from the website of the Ministry of National Defence (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0250	Details
Polish-English parallel corpus from the website of the Ministry of Regional Development (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0282	Details
Polish-English parallel corpus from the website of the Ministry of Science and Higher Education (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0286	Details
Polish-English parallel corpus from the website of the Ministry of the Interior and Administration (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0258	Details
Polish-English parallel corpus from the website of the National Audiovisual Institute (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0289	Details
Polish-English parallel corpus from the website of the National Centre for Nuclear Research (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0278	Details
Polish-English parallel corpus from the website of the National Centre for Research and Development (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0280	Details
Polish-English parallel corpus from the website of the National Digital Archives (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0290	Details
Polish-English parallel corpus from the website of the National Science Centre (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0260	Details
Polish-English parallel corpus from the website of the National Security Bureau (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0262	Details
Polish-English parallel corpus from the website of the Office of the Commissioner for Human Rights (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0281	Details
Polish-English parallel corpus from the website of the Polish Tourism Organisation (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0276	Details
Polish-English parallel corpus from the website of the State Marine Accident Investigation Commission (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0288	Details
Polish-English parallel corpus from the website of the U.S. EMBASSY and CONSULATE IN POLAND (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0277	Details
Polish-English parallel corpus from the website "Polish Aid" (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0275	Details
Polish-English parallel corpus from the website "Science in Poland" (Processed)	18 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0287	Details
Polish Food 4 & Food Policy Dataset (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0179	Details
Polish Food Dataset 2 (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0180	Details
Polish Food DataSet 3 (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0181	Details
Polish Food Dataset (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0177	Details
Polish Ministry of Foreign Affairs Historical Dataset (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0183	Details
Polish Ministry of Foreign Affairs Regional Dataset (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0182	Details
Polish Ministry of Foreign Affairs reports in EN and PL (Processed)	3 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0235	Details
Polish Ministry of Foreign Affairs Youth 2011 Report (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0184	Details
Portuguese-English bilingual corpus from Legislation concerning the Portuguese Parliament (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0245	Details
Portuguese-English bilingual corpus from the Portuguese Constitution (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0246	Details
PRESS 65	6.3 Mb	Språkdata has made available the first of its many Swedish corpora, PRESS 65. It consists of one million running words …	…	ELRA-W0010	Details
PTPARL Corpus	25 Mb	The PTPARL Corpus contains 1,076 texts consisting of adapted transcriptions of the Portuguese Parliament sessions. The …	…	ELRA-W0060	Details
Public Procurement Dataset 1 (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0187	Details
Public Procurement Dataset 2 (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0185	Details
Quaero Old Press Extended Named Entity corpus	6.8 Gb	The Quaero Old Press Extended Named Entity corpus consists of the manual annotation of 76 newspaper issues published in…	…	ELRA-W0073	Details
Qualified POS Tagged Corpus	66 Mb	Monolingual corpus in a .txt format, produced by KAIST KORTERM, containing 1020000 eojeols (Korean terms) in Korean. Th…	…	ELRA-W0034	Details
Quarterly Reports of the Parliamentary Budget Office (Hellenic Parliament) (Processed)	15 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0243	Details
ROCO Romanian journalistic corpus	729 Mb	ROCO is a Romanian journalistic corpus containing approximately 7.1 million tokens, the number of types being 231,626. …	…	ELRA-W0085	Details
Romanian-English corpus with studies, reports and statistical data in the field of culture from the National Institute for Cultural Research and Training website (Processed)	8 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0270	Details
Romanian - English literature corpus (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0192	Details
Romanian – English New Criminal Procedure Code (Processed)	4 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0170	Details
Romanian - English news corpus (Processed)	63 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0194	Details
Romanian Ombudsman archive (Processed)	5 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0206	Details
ROMBAC - Romanian balanced corpus	1.1 Gb	ROMBAC is a Romanian corpus containing equal shares of texts from 5 different genres: journalism, legalese, fiction, me…	…	ELRA-W0088	Details
Secretariat-General parallel corpus SL-EN and EN-SL (part 1) (Processed)	34 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0190	Details
Secretariat-General parallel corpus SL-EN and EN-SL (part 2) (Processed)	39 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0191	Details
SIP Publications (Processed)	7 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0306	Details
Slovenian-English corpus with statistical reports from the Statistical Office of the Republic of Slovenia website (Processed)	9 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0267	Details
Spanish-English website parallel corpus (Processed)	9 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0248	Details
Tagged text in French (MEMODATA) with rules of morphological disambiguation	3.1 Gb	More than 170 books (classical novels, legal texts...) are tagged with rules of morphological disambiguation. A tagged …	…	ELRA-W0012	Details
Tagged text in French (MEMODATA) with typographic tags	247 Mb	More than 170 books (classical novels, legal texts...) are tagged with typographic tags. A tagged corpus of 50 books is…	…	ELRA-W0011	Details
Text corpus of "Le Monde"	3.9 Gb	Electronic archiving of "Le Monde" articles started on 1 January 1987. Some 200 articles are added every day, and as of…	…	ELRA-W0015	Details
The CINTIL Corpus – International Corpus of Portuguese	20 Mb	CINTIL-Corpus Internacional do Português is a linguistically interpreted written and spoken corpus of European Portugue…	…	ELRA-W0050	Details
The Coimisineir Teanga Bilingual Corpus of Reference Documents (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0224	Details
The Coimisineir Teanga Bilingual Corpus of Reports and Press Releases (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0230	Details
The Croatian-English corpus with the nature protection strategy of Croatia (Processed)	1 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0296	Details
The EMILLE/CIIL Corpus	1.5 Gb	The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen mo…	…	ELRA-W0037	Details
The Gaois bilingual corpus of English-Irish legislation (Processed)	26 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0223	Details
The Lancaster Corpus of Mandarin Chinese (LCMC)	45 Mb	The Lancaster Corpus of Mandarin Chinese (LCMC) is designed as a Chinese match for the FLOB and FROWN corpora for moder…	…	ELRA-W0039	Details
TRAD Arabic-English Mailing lists Parallel corpus - Development set	2 Mb	This is a parallel corpus of 10,000 words in Arabic and a reference translation in English. The source texts are emails…	…	ELRA-W0108	Details
TRAD Arabic-English Mailing lists Parallel corpus - Test set	2 Mb	This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are email…	…	ELRA-W0106	Details
TRAD Arabic-English Newspaper Parallel corpus - Test set 1	2 Mb	This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are artic…	…	ELRA-W0099	Details
TRAD Arabic-English Parallel corpus of transcribed Broadcast News Speech	2 Mb	This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are trans…	…	ELRA-W0102	Details
TRAD Arabic-English Web domain (blogs) Parallel corpus	2 Mb	This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are blog …	…	ELRA-W0104	Details
TRAD Arabic-French Mailing lists Parallel corpus - Development set	1 Mb	This is a parallel corpus of 10,000 words in Arabic and a reference translation in French. The source texts are emails …	…	ELRA-W0107	Details
TRAD Arabic-French Mailing lists Parallel corpus - Test set	2 Mb	This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are emails…	…	ELRA-W0105	Details
TRAD Arabic-French Newspaper Parallel corpus - Test set 1	2 Mb	This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are articl…	…	ELRA-W0098	Details
TRAD Arabic-French Newspaper Parallel corpus - Test set 2	2 Mb	This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in French. The source texts are articl…	…	ELRA-W0100	Details
TRAD Arabic-French Parallel corpus of transcribed Broadcast News Speech	2 Mb	This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are transc…	…	ELRA-W0101	Details
TRAD Arabic-French Web domain (blogs) Parallel corpus	2 Mb	This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are blog a…	…	ELRA-W0103	Details
TRAD Chinese-English Email Parallel corpus – Development Set	1 Mb	This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and a reference translation in E…	…	ELRA-W0113	Details
TRAD Chinese-English Email Parallel corpus – Test Set	1 Mb	This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in …	…	ELRA-W0115	Details
TRAD Chinese-English News Articles Parallel corpus	1 Mb	This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in …	…	ELRA-W0112	Details
TRAD Chinese-English Web domain (blogs) Parallel corpus	1 Mb	This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in …	…	ELRA-W0110	Details
TRAD Chinese-French Email Parallel corpus – Development Set	2 Mb	This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and a reference translation in F…	…	ELRA-W0114	Details
TRAD Chinese-French Email Parallel corpus – Test Set	2 Mb	This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in …	…	ELRA-W0116	Details
TRAD Chinese-French News Articles Parallel corpus	2 Mb	This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in …	…	ELRA-W0111	Details
TRAD Chinese-French Web domain (blogs) Parallel corpus	2 Mb	This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in …	…	ELRA-W0109	Details
TRAD Pashto-English News Articles Parallel corpus	602 Kb	This is a parallel corpus, which contains 10,000 Pashto words translated into English by two different translators. The…	…	ELRA-W0097	Details
TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data	575 Kb	This is a parallel corpus, which contains 10,000 Pashto words translated into English. The source texts come from 3 bro…	…	ELRA-W0095	Details
TRAD Pashto-French News Articles Parallel corpus	970 Kb	This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The …	…	ELRA-W0096	Details
TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data	29 Mb	This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The …	…	ELRA-W0094	Details
TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data	473 Mb	The corpus consists of the transcription of 106 hours of recordings in Pashto translated into French. The transcription…	…	ELRA-W0093	Details
TRAD Pashto Monolingual text Corpus	2.2 Gb	This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different b…	…	ELRA-W0092	Details
Training and test data for Arabizi detection and transliteration	1 Mb	The dataset is composed of two distinct resources:1) A collection of mixed English and Arabizi text intended to train a…	…	ELRA-W0126	Details
Translation memories from The Ministry of Foreign Affairs of Norway (Processed)	620 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0156	Details
Translation memory from Swedish National Audit Office (NAO) - Riksrevisionen (Processed)	12 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0236	Details
Translations of Lithuanian legislation from Seimas of the Republic of Lithuania (Processed)	70 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0165	Details
Trilingual Documents related to International Judicial Cooperation in Civil Matters (Greek-English-French) (Processed)	2 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0307	Details
TSNLP (Test Suites for NLP Testing)	4.5 Mb	The TSNLP project (LRE 62-089) has produced a database of test suites for English, French and German containing over 4,…	…	ELRA-W0013	Details
Venice Italian Treebank (VIT)	149 Mb	The VIT, Venice Italian Treebank is the effort of the collaboration of people working at the Laboratory of Computationa…	…	ELRA-W0040	Details
Website of the President of the Republic of Lithuania (Processed)	7 Mb	This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur…	…	ELRA-W0160	Details
Wolverhampton Business English Corpus	118 Mb	The WBE was created by the Computational Linguistics Group at University of Wolverhampton through a funding from ELRA i…	…	ELRA-W0028	Details
Name	Size	Description	Language	ELRA	Details	Your selection