2006 CoNLL Shared Task - Ten Languages |
85.2 Mb |
2006 CoNLL Shared Task - Ten Languages consists of dependency treebanks in ten languages used as part of the CoNLL 2006… |
…
|
ELRA-W0086
|
Details
|
|
2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish |
45 Mb |
2007 CoNLL Shared Task - Basque, Catalan, Czech & Turkish consists of dependency treebanks in four languages used as pa… |
…
|
ELRA-W0121
|
Details
|
|
2007 CoNLL Shared Task - Greek, Hungarian & Italian |
18 Mb |
2007 CoNLL Shared Task - Greek, Hungarian & Italian consists of dependency treebanks in three languages used as part of… |
…
|
ELRA-W0122
|
Details
|
|
Al-Hayat Arabic Corpus |
1.1 Gb |
The corpus was developed in the course of a research project at the University of Essex, in collaboration with the Open… |
…
|
ELRA-W0030
|
Details
|
|
Amaryllis Corpus - Evaluation Package |
505 Mb |
Launched at the end of 1995, the AMARYLLIS project aimed at evaluating information retrieval software for French text c… |
…
|
ELRA-W0029
|
Details
|
|
Amharic-English bilingual corpus |
15 Mb |
The Amharic-English bilingual corpus contains parallel text from legal and news domains in Amharic script, in translite… |
…
|
ELRA-W0074
|
Details
|
|
An-Nahar Newspaper Text Corpus |
794 Mb |
The An-Nahar Lebanon Newspaper Text Corpus comprises articles in standard Arabic from 1995 to 2000 (6 years) stored as … |
…
|
ELRA-W0027
|
Details
|
|
Arbobanko (Esperanto Treebank) |
12 Mb |
The Arbobanko (Esperanto Treebank) is a 52,000 token dependency treebank of Esperanto with texts from the MONATO news m… |
…
|
ELRA-W0129
|
Details
|
|
Arboretum treebank |
26 Mb |
The Arboretum treebank is a morphologically and syntactically annotated repository of Danish sentences, taken from Korp… |
…
|
ELRA-W0084
|
Details
|
|
ARCADE/ROMANSEVAL corpus |
63 Mb |
The ARCADE/ROMANSEVAL corpus was used as a reference corpus in two international competitions:· ARCADE, an exercise on … |
…
|
ELRA-W0018
|
Details
|
|
A "scientific" corpus of modern French ("La Recherche" magazine) - Complete version |
23 Mb |
This "scientific" corpus of modern French was produced by the University of Nantes (France) within the European Commiss… |
…
|
ELRA-W0025-02
|
Details
|
|
Bilingual Bulgarian-English corpus from the 2018 Proposal for a National Climate Change Adaptation Strategy and Action Plan from the website of the Bulgarian Ministry of Environment and Water (Processed) |
12 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0263
|
Details
|
|
Bilingual Bulgarian-English corpus from the National Revenue Agency (BG) (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0173
|
Details
|
|
Bilingual collection of documents about the Cyprus Problem (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0132
|
Details
|
|
Bilingual collection of reports of the Greek Public Power Corporation (Processed) |
13 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0244
|
Details
|
|
Bilingual Croatian-English Parallel Corpus (Processed) |
18 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0204
|
Details
|
|
Bilingual documents Bulgarian-English in the field of ICT and Transport (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0133
|
Details
|
|
Bilingual documents Bulgarian-English in the field of open data, broadband and information society (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0134
|
Details
|
|
Bilingual documents Bulgarian-English in the field of transport (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0161
|
Details
|
|
Bilingual hr-en parallel corpus from Croatian Mine Action website (Processed) |
12 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0131
|
Details
|
|
Bilingual hr-en parallel corpus from Croatian National Bank website (Processed) |
8 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0226
|
Details
|
|
Bilingual hr-en parallel corpus from the Journal of the Croatian Association of Civil Engineers website (Processed) |
12 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0273
|
Details
|
|
Bilingual hr-en parallel corpus from the National and University Library in Zagreb website (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0135
|
Details
|
|
Bilingual resource with Bulgarian strategic documents in the field of innovations and digital growth (Bulgarian - English) (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0153
|
Details
|
|
Bilingual resource with Bulgarian strategic documents in the field of telecommunications and broadband (Bulgarian - English) (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0171
|
Details
|
|
BMI Brochures 2011-2015 (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0200
|
Details
|
|
BMI Brochures and Website 2016 (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0199
|
Details
|
|
BMVI Publications (Processed) |
5 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0197
|
Details
|
|
BMVI Website (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0198
|
Details
|
|
Catalan Corpus of News Articles |
645 Mb |
The Catalan Corpus of News Articles comprises articles in Catalan from 1 January 1999 to 31 March 2007. These articles … |
…
|
ELRA-W0047
|
Details
|
|
Catalan-Spanish Parallel Corpus |
686 Mb |
This corpus contains more than 100 million words and it contains 10 years of bilingual articles from “El Periódico de C… |
…
|
ELRA-W0053
|
Details
|
|
Central Statistical Office Dataset (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0174
|
Details
|
|
Chinese-Vietnamese Parallel Corpus |
74 Mb |
The Chinese-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per sente… |
…
|
ELRA-W0312
|
Details
|
|
CINTIL-DeepBank |
213 Mb |
The CINTIL-DeepBank (Branco et al., 2010) is a corpus of sentences annotated with their full-fledged deep grammatical r… |
…
|
ELRA-W0062
|
Details
|
|
CINTIL-DependencyBank |
1.4 Mb |
The CINTIL-DependencyBank (Silva and Branco, 2012) is a corpus of sentences annotated with their syntactic dependency g… |
…
|
ELRA-W0061
|
Details
|
|
CINTIL-PropBank |
3.6 Mb |
The CINTIL-PropBank is a corpus of sentences annotated with their constituency structure and semantic role tags, compos… |
…
|
ELRA-W0056
|
Details
|
|
CINTIL-TreeBank |
3.1 Mb |
The CINTIL-TreeBank is a corpus of syntactic constituency trees of Portuguese texts composed of 10,039 sentences and 11… |
…
|
ELRA-W0055
|
Details
|
|
Civil Aviation Regulations (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0186
|
Details
|
|
Compendium The Social Insurance Institution (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0225
|
Details
|
|
Convention against Torture and Other Cruel, Inhuman or Degrading Treatment or Punishment - United Nations (French-English-Greek) (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0309
|
Details
|
|
Convention on the transfer of sentenced persons (English - Greek) (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0196
|
Details
|
|
Corpus of Contemporaneous Spanish Novels |
4.8 Mb |
This corpus consists of 11 novels written in Castilian Spanish by Inmaculada Ferrer-Vidal Turull, a contemporaneous aut… |
…
|
ELRA-W0041
|
Details
|
|
Corpus of Icelandic texts from the Central Bank of Iceland (Processed) |
33 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0298
|
Details
|
|
Corpus of State-related content from the Latvian Web (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0169
|
Details
|
|
Corpus on Finance and Economics from Bank of Latvia (Processed) |
6 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0216
|
Details
|
|
CRATER 2 Corpus |
359 Mb |
The CRATER corpus was built upon the foundations of an earlier project, ET10/63, which was funded in the final phase of… |
…
|
ELRA-W0033
|
Details
|
|
CRATER corpus |
276 Mb |
The Corpus Resources and Terminology Extraction project (MLAP-93 20) has extended the bilingual annotated English-Frenc… |
…
|
ELRA-W0003
|
Details
|
|
Croatian-English corpus with Acts on Biological and Landscape Diversity and Environmental Protection (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0142
|
Details
|
|
Croatian-English corpus with statistical reports and studies from the Croatian Bureau of Statistics website (Processed) |
9 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0264
|
Details
|
|
Croatian-English corpus with studies on the challenges to the Croatian Accession to the European Union from the Croatian Institute of Public Finance website (Processed) |
9 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0266
|
Details
|
|
Croatian-English corpus with the Rural Development Programme for the Period 2014-2020 from the Croatian Rural Development Programme website (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0295
|
Details
|
|
Croatian-English parallel corpus from the website of the Croatian Journal of Fisheries (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0294
|
Details
|
|
Croatian-English parallel corpus from the website of the Embassy of Finland, Zagreb (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0292
|
Details
|
|
Croatian-English parallel corpus from the website of the Government Office for Cooperation with NGOs (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0291
|
Details
|
|
Croatian-English parallel corpus from the website of the Ministry of Foreign and European Affairs, Republic of Croatia (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0293
|
Details
|
|
DA-EN Danish Ministry of Higher Education and Science 2 (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0157
|
Details
|
|
DA-EN Danish Ministry of Higher Education and Science 3 (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0155
|
Details
|
|
DA-EN Danish Ministry of Higher Education and Science 4 (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0172
|
Details
|
|
DA-EN Danish Ministry of Higher Education and Science (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0166
|
Details
|
|
Danish Propbank |
18 Mb |
The Danish Propbank (DPB) is a multi-layer treebank, annotated not only with morphosyntactic, but also with semantic in… |
…
|
ELRA-W0117
|
Details
|
|
deL1L2IM corpus |
2.8 Mb |
The deL1L2IM corpus, created between May and August 2012 and last updated in August 2014, has been collected within the… |
…
|
ELRA-W0083
|
Details
|
|
Dutch PAROLE Distributable Corpus |
70 Mb |
The Dutch PAROLE Distributable Corpus is a 3 million words selection from the 20 million words Dutch PAROLE Reference c… |
…
|
ELRA-W0019
|
Details
|
|
ECI-ELSNET Italian & German tagged sub-corpus |
3 Mb |
The objective is to provide a small but fine grained morphosyntactically tagged corpus, 50.000 running words for each o… |
…
|
ELRA-W0005
|
Details
|
|
ECI/MCI (European Corpus Initiative/Multilingual Corpus I) |
655 Mb |
The European Corpus Initiative (ECI) was founded to oversee the acquisition and preparation of a large multilingual cor… |
…
|
ELRA-W0004
|
Details
|
|
ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1 |
802 Mb |
The European Comparable and Parallel Corpora of Parliamentary Speeches Archive (ECPC), compiled at the Universitat Jaum… |
…
|
ELRA-W0128
|
Details
|
|
EJTN Handbook (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0163
|
Details
|
|
Ema-lon Manipuri Corpus (including word embedding and language model) |
– |
The Ema-lon Manipuri Corpus consists of a set of resources for Manipuri language (locally known as Meiteilon) for the p… |
…
|
ELRA-W0316
|
Details
|
|
Employment in Poland 2009 report in EN-PL (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0242
|
Details
|
|
English-Chinese-Vietnamese Trilingual Parallel Corpus |
6 Mb |
The English-Chinese-Vietnamese Trilingual Parallel Corpus consists of 20,046 trilingual sets of sentence pairs. The cor… |
…
|
ELRA-W0314
|
Details
|
|
English - Croatian parallel corpus from texts of the Swedish Crime Victim Compensation and Support Authority (Brottsoffermyndigheten) web site (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0238
|
Details
|
|
English-Danish Parallel corpus from Tatoeba project (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0214
|
Details
|
|
English-Estonian corpus from Finnish Information Bank (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0218
|
Details
|
|
English-Estonian Parallel corpus compiled from translated annual reports from Estonian Academy of Sciences |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0265
|
Details
|
|
English-Finnish corpus from Finnish Information Bank (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0217
|
Details
|
|
English-Icelandic parallel corpus from Statistics Iceland (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0219
|
Details
|
|
English-Nepali Parallel Corpus |
47 Mb |
The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali … |
…
|
ELRA-W0077
|
Details
|
|
English-Norwegian parallel corpus from Forbruker Europa, 2017 release (Processed) |
6 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0195
|
Details
|
|
English-Persian parallel corpus |
287 Mb |
The English-Persian parallel corpus contains more than 200,000 aligned sentences across a variety of text types from th… |
…
|
ELRA-W0118
|
Details
|
|
English-Persian parallel Corpus |
40 Mb |
Please refer to ELRA-W0118 for the latest version of this corpus. This version consists of about 3,500,000 English and … |
…
|
ELRA-W0051
|
Details
|
|
ENGLISH/POLISH PHRASE BOOK FOR ADMINISTRATIVE STAFF of LOCAL GOVERNMENT UNITS (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0227
|
Details
|
|
English-Slovak corpus of annual reports from the Slovak National Centre for Human Rights website (Processed) |
5 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0137
|
Details
|
|
English-Slovak corpus of annual reports on immigration and asylum policies from the EMN National Contact Point for the Slovak Republic website (Processed) |
6 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0136
|
Details
|
|
English-Slovak parallel corpus of texts from The Ministry of Culture of the Slovak Republic (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0188
|
Details
|
|
English-Slovak parallel corpus of texts from The Ministry of Justice of the Slovak Republic (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0189
|
Details
|
|
English-Swedish corpus from Finnish Information Bank (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0222
|
Details
|
|
English-Swedish parallel corpus from Annual Reports of the Swedish Pension System (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0268
|
Details
|
|
English - Swedish parallel corpus from texts of the Swedish Crime Victim Compensation and Support Authority (Brottsoffermyndigheten) web site (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0237
|
Details
|
|
English-Swedish parallel corpus from the Annual Overview of Sweden’s Official aid Agency SIDA Activities (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0269
|
Details
|
|
English-Swedish parallel corpus from the translation of 'Sweden a Pocket Guide' book (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0130
|
Details
|
|
English-Swedish parallel corpus from the web site of the Swedish Migration Board - Migrationsverket (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0239
|
Details
|
|
English-Swedish parallel texts from The Swedish Agency for Economic and Regional Growth - Tillväxtverket (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0240
|
Details
|
|
English-Vietnamese Parallel Corpus |
166 Mb |
This is a corpus of 500,000 English-Vietnamese sentence pairs, built to develop SMT (Statistical Machine Translation) s… |
…
|
ELRA-W0124
|
Details
|
|
English-Vietnamese Parallel Corpus |
397 Mb |
The English-Vietnamese Parallel Corpus consists of 1,000,000 sentence pairs, with an average length of 20 words per sen… |
…
|
ELRA-W0311
|
Details
|
|
EUIPO - IP case law French-English (Processed) |
56 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0138
|
Details
|
|
EUIPO - IP case law German-English (Processed) |
154 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0140
|
Details
|
|
EUIPO - IP case law Italian-English (Processed) |
22 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0141
|
Details
|
|
EUIPO - IP case law Spanish-English (Processed) |
74 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0139
|
Details
|
|
EUIPO - list of goods and services French and English (Processed) |
7 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0149
|
Details
|
|
EUIPO - list of goods and services German and English (Processed) |
7 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0143
|
Details
|
|
EUIPO - list of goods and services German and French (Processed) |
7 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0145
|
Details
|
|
EUIPO - list of goods and services German and Italian (Processed) |
7 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0146
|
Details
|
|
EUIPO - list of goods and services German and Spanish (Processed) |
7 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0144
|
Details
|
|
EUIPO - list of goods and services Italian and English (Processed) |
8 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0150
|
Details
|
|
EUIPO - list of goods and services Italian and French (Processed) |
11 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0152
|
Details
|
|
EUIPO - list of goods and services Italian and Spanish (Processed) |
11 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0151
|
Details
|
|
EUIPO - list of goods and services Spanish and English (Processed) |
8 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0147
|
Details
|
|
EUIPO - list of goods and services Spanish and French (Processed) |
11 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0148
|
Details
|
|
EUROPARL Corpus Parallel Corpora: Portuguese-English |
2.3 Gb |
The EUROPARL Corpus (Portuguese-English subpart of the parallel corpora), was extracted from the proceedings of the Eur… |
…
|
ELRA-W0090
|
Details
|
|
Expression of interest (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0209
|
Details
|
|
Financial Stability Reports from the National Bank of Poland (2013-14) (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0228
|
Details
|
|
Financial Stability Reports from the National Bank of Poland (2015-16) (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0229
|
Details
|
|
GeFRePaC - German French Reciprocal Parallel Corpus |
1.3 Gb |
The German-French Reciprocal Parallel Corpus (GeFRePaC) was produced by the Multilinguale Forschung/Multilingual Resear… |
…
|
ELRA-W0031
|
Details
|
|
General Romanian-English bilingual corpus (Processed) |
75 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0193
|
Details
|
|
Greek anti-corruption legislation and National Anti-Corruption Plan (greek-english) (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0164
|
Details
|
|
Greek-English parallel corpus from the website of the Prime Minister of the Hellenic Republic (Processed) |
5 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0272
|
Details
|
|
Hallituskausi 2007-2011 -- Finnish-English Translation Memory (Processed) |
23 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0220
|
Details
|
|
Hallituskausi 2011-2015 -- Finnish-English Translation Memory (Processed) |
14 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0221
|
Details
|
|
Hellenic Ministry of Foreign Affairs Greek-English announcements corpus (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0271
|
Details
|
|
Helsinki Corpus of Swahili |
1117 Mb |
This is a text corpus of Swahili language of 25 million words, annotated for part-of-speech, morphology and syntax. The… |
…
|
ELRA-W0119
|
Details
|
|
ICE-GB (British English component of the International Corpus of English) |
97 Mb |
ICE-GB is the British component of the International Corpus of English (ICE). ICE began in 1990 with the primary aim of… |
…
|
ELRA-W0021
|
Details
|
|
ILSP/ELEFTHEROTYPIA Corpus (Greek corpus) |
27 Mb |
The ILSP/ELEFTHEROTYPIA Corpus contains approximately 3 million words classified and annotated according to the common … |
…
|
ELRA-W0022
|
Details
|
|
International Agreements (Processed) |
20 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0158
|
Details
|
|
Italian Syntactic-Semantic Treebank (ISST) |
90 Mb |
ISST comprises 89,941 tokens for the financial-domain part and 215,606 tokens for the general part. It is formatted in … |
…
|
ELRA-W0044
|
Details
|
|
Karl May Korpus (KMK) |
77 Mb |
The "Karl-May-Korpus" is a monolingual German corpus, available in an SGML-tagged ASCII text format. It contains the wo… |
…
|
ELRA-W0016
|
Details
|
|
Khresmoi manually annotated reference corpus |
1.3 Gb |
The Manually Annotated Reference Corpus is a collection of English web documents annotated with key entities (such as d… |
…
|
ELRA-W0081
|
Details
|
|
Korean-Vietnamese Parallel Corpus |
62 Mb |
The Korean-Vietnamese Parallel Corpus consists of 200,000 sentence pairs, with an average length of 15 words per senten… |
…
|
ELRA-W0313
|
Details
|
|
Laws of Malta (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0234
|
Details
|
|
Legal texts from Estonian Ministry of Justice (Processed) |
23 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0167
|
Details
|
|
"Le Monde Diplomatique" Arabic tagged corpus |
59 Mb |
This corpus contains 102,960 vowelised, lemmatised and tagged words (58 texts from Le Monde Diplomatique Arabic, see al… |
…
|
ELRA-W0049
|
Details
|
|
"Le Monde Diplomatique" Text corpus in Arabic |
57 Mb |
Electronic archiving of "Le Monde Diplomatique" articles in Arabic from 2000. The corpus is available in HTML. Each HTM… |
…
|
ELRA-W0036-04
|
Details
|
|
"Le Monde Diplomatique" Text corpus in English |
28 Mb |
Electronic archiving of "Le Monde Diplomatique" articles in English from 1999. The corpus is available in HTML. Each HT… |
…
|
ELRA-W0036-03
|
Details
|
|
"Le Monde Diplomatique" Text corpus in French - archives 1980-1998 |
233 Mb |
Electronic archiving of "Le Monde Diplomatique" articles in French from 1980 to 1998. The corpus is available in HTML. … |
…
|
ELRA-W0036-01
|
Details
|
|
"Le Monde Diplomatique" Text corpus in French - archives from 1999 |
90 Mb |
Electronic archiving of "Le Monde Diplomatique" articles in French from 1999. The corpus is available in HTML. Each HTM… |
…
|
ELRA-W0036-02
|
Details
|
|
Letter of rights for persons arrested and or detained (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0308
|
Details
|
|
Letter of rights for persons arrested on the basis of a European Arrest Warrant (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0301
|
Details
|
|
LT Corpus |
43 Mb |
The LT Corpus is composed of 70 fiction texts from Portuguese renowned authors. The corpus contains 1,781,083 tokens. T… |
…
|
ELRA-W0059
|
Details
|
|
Luxembourg Museum Websites (de-en) (Processed) |
45 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0201
|
Details
|
|
Macroeconomic Developments (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0207
|
Details
|
|
Malta Government Gazette (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0233
|
Details
|
|
Maltese-English website parallel corpus (Processed) |
10 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0232
|
Details
|
|
Memorandum for a ESM programme (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0210
|
Details
|
|
Methodological Reconciliation (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0208
|
Details
|
|
MLCC Multilingual and Parallel Corpora |
915 Mb |
The MLCC text corpus has two main components - one set to allow comparable studies to be carried out in different langu… |
…
|
ELRA-W0023
|
Details
|
|
Modern French Corpus including Anaphors Tagging |
13 Mb |
The corpus that includes the tagging of the anaphors was created by the CRISTAL-GRESEC (Stendhal-Grenoble 3 University,… |
…
|
ELRA-W0032
|
Details
|
|
Monolingual documents from the Government of Lithuania (Processed) |
10 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0299
|
Details
|
|
Monolingual Greek corpus |
5.1 Mb |
Monolingual Greek corpus of 1 million words. The corpus consists of articles written in 1996 from the Greek daily newsp… |
…
|
ELRA-W0014
|
Details
|
|
Monolingual Vietnamese Annotated Corpus |
36 Mb |
The Monolingual Vietnamese Annotated Corpus consists of 100,000 sentences, manually annotated with word boundaries, POS… |
…
|
ELRA-W0310
|
Details
|
|
MTP Annotated German corpus - tagged version |
35 Mb |
This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP… |
…
|
ELRA-W0008-02
|
Details
|
|
MTP Annotated German corpus - untagged version |
283 Mb |
This morphosyntactically annotated 500,000 word German corpus was developed as part of the Münster Tagging Project (MTP… |
…
|
ELRA-W0008-01
|
Details
|
|
MULTEXT JOC Corpus |
114 Mb |
This CD-ROM contains a part of the corpus developed in the MULTEXT project financed by the European Commission (LRE 62-… |
…
|
ELRA-W0017
|
Details
|
|
Multilingual Corpus |
9.9 Mb |
Multilingual parallel corpus produced by Kaist Korterm containing 60 000 expressions in Korean, Chinese and English. |
…
|
ELRA-W0035
|
Details
|
|
National Health Fund Dataset (Processed) |
5 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0178
|
Details
|
|
Natolin European Centre Dataset (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0176
|
Details
|
|
NE3L named entities Arabic corpus |
3 Mb |
The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with nam… |
…
|
ELRA-W0078
|
Details
|
|
NE3L named entities Chinese corpus |
4.8 Mb |
The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with nam… |
…
|
ELRA-W0079
|
Details
|
|
NE3L named entities Russian corpus |
2.7 Mb |
The NE3L project (Named Entities 3 Languages) consisted in annotating several corpora with different languages with nam… |
…
|
ELRA-W0080
|
Details
|
|
NEMLAR Written Corpus |
136 Mb |
This corpus was produced within the NEMLAR project (http://www.nemlar.org). Two other resources, produced within the sa… |
…
|
ELRA-W0042
|
Details
|
|
Nepali Monolingual written corpus |
683 Mb |
The Nepali Monolingual written corpus is one of the 3 resources that constitute the Nepali National Corpus. The Nepali … |
…
|
ELRA-W0076
|
Details
|
|
Normalized Arabic Fragments for Inestimable Stemming (NAFIS) |
1 Mb |
Normalized Arabic Fragments for Inestimable Stemming (NAFIS) is an Arabic stemming gold standard corpus composed by a c… |
…
|
ELRA-W0127
|
Details
|
|
NPChunks |
412 Kb |
NPChunks is a training corpus containing approximately 1,000 sentences, with a total of 24,243 tokens, selected randoml… |
…
|
ELRA-W0089
|
Details
|
|
NUM 5M Mongolian written corpus |
65 Mb |
This is a corpus of Mongolian text mostly from domains like online or printed daily newspapers, literature, and laws.Th… |
…
|
ELRA-W0120
|
Details
|
|
PANACEA English-French and English-Greek parallel corpus acquired for Environment domain |
11 Mb |
The PANACEA English-French and English-Greek parallel corpus was acquired in the framework of the PANACEA project (Plat… |
…
|
ELRA-W0057
|
Details
|
|
PANACEA English-French and English-Greek parallel corpus acquired for Labour Legislation domain |
16 Mb |
The PANACEA English-French and English-Greek parallel corpus was acquired in the framework of the PANACEA project (Plat… |
…
|
ELRA-W0058
|
Details
|
|
PANACEA Environment English monolingual corpus |
2.7 Gb |
The PANACEA Environment English monolingual corpus was acquired in the framework of the PANACEA project (Platform for A… |
…
|
ELRA-W0063
|
Details
|
|
PANACEA Environment French monolingual corpus |
2.1 Gb |
The PANACEA Environment French monolingual corpus was acquired in the framework of the PANACEA project (Platform for Au… |
…
|
ELRA-W0065
|
Details
|
|
PANACEA Environment Greek monolingual corpus |
2 Gb |
The PANACEA Environment Greek monolingual corpus was acquired in the framework of the PANACEA project (Platform for Aut… |
…
|
ELRA-W0067
|
Details
|
|
PANACEA Environment Italian monolingual corpus |
1.8 Gb |
The PANACEA Environment Italian monolingual corpus was acquired in the framework of the PANACEA project (Platform for A… |
…
|
ELRA-W0069
|
Details
|
|
PANACEA Environment Spanish monolingual corpus |
2.3 Gb |
The PANACEA Environment Spanish monolingual corpus was acquired in the framework of the PANACEA project (Platform for A… |
…
|
ELRA-W0071
|
Details
|
|
PANACEA Labour English monolingual corpus |
1.6 Gb |
The PANACEA Labour English monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automa… |
…
|
ELRA-W0064
|
Details
|
|
PANACEA Labour French monolingual corpus |
2.5 Gb |
The PANACEA Labour French monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automat… |
…
|
ELRA-W0066
|
Details
|
|
PANACEA Labour Greek monolingual corpus |
1.4 Gb |
The PANACEA Labour Greek monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automati… |
…
|
ELRA-W0068
|
Details
|
|
PANACEA Labour Italian monolingual corpus |
2.4 Gb |
The PANACEA Labour Italian monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automa… |
…
|
ELRA-W0070
|
Details
|
|
PANACEA Labour Spanish monolingual corpus |
1.9 Gb |
The PANACEA Labour Spanish monolingual corpus was acquired in the framework of the PANACEA project (Platform for Automa… |
…
|
ELRA-W0072
|
Details
|
|
Parallel corpus (Bulgarian - English) in the public administration domain (Processed) |
9 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0211
|
Details
|
|
Parallel corpus (en-pl) from the Export Promotion Portal of Poland (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0247
|
Details
|
|
Parallel corpus from Bank of Estonia (Processed) |
8 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0162
|
Details
|
|
Parallel corpus from Estonian Cabinet of Ministers (Processed) |
7 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0154
|
Details
|
|
Parallel corpus from Estonian Ministry of Foreign Affairs (Processed) |
12 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0168
|
Details
|
|
Parallel corpus from Parliament of Estonia (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0215
|
Details
|
|
Parallel corpus from Social Insurance Agency -- Försäkringskassan (Sweden) (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0213
|
Details
|
|
Parallel corpus from the website of the Chancellery of the Prime Minister of Poland (Processed) |
6 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0249
|
Details
|
|
Parallel Corpus from the Web Site of the the MFA of Latvia (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0159
|
Details
|
|
Parallel corpus (Greek - English) in the law domain (Processed) (Part1) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0205
|
Details
|
|
Parallel corpus (Greek - English) in the public administration domain (Processed) |
14 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0203
|
Details
|
|
Parallel corpus (Polish - English) from the website of the Polish Investment and Trade Agency (Processed) |
8 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0212
|
Details
|
|
Parallel Global Voices (Bulgarian - English) (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0297
|
Details
|
|
Parallel Global Voices (English - Polish) (Processed) |
28 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0241
|
Details
|
|
Parallel Global Voices (Greek - English) (Processed) |
43 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0202
|
Details
|
|
Parallel texts from Swedish Labour market agency. Part 2 (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0300
|
Details
|
|
Parallel texts from Swedish Labour market agency (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0302
|
Details
|
|
Parallel texts from Swedish National Food Agency (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0305
|
Details
|
|
Parallel texts from Swedish Social Security Authority (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0303
|
Details
|
|
Parallel texts from Swedish Work environment Authority (Processed) |
7 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0304
|
Details
|
|
Parallel texts from the Swedish Competition Authority - Konkurrensverket (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0231
|
Details
|
|
PAROLE French Corpus |
349 Mb |
The PAROLE French corpus contains the following data:Miscellaneous: Data provided by ELRA (CRATER, MLCC Multilingual an… |
…
|
ELRA-W0020
|
Details
|
|
PAROLE Irish Distributable Corpus |
25 Mb |
The PAROLE Irish Distributable Corpus consists of over 8 million words (a subset of the 15+ million words Irish Referen… |
…
|
ELRA-W0026
|
Details
|
|
PAROLE Italian Corpus |
44 Mb |
The PAROLE Italian Corpus comprises 3,135,651 words collected from four different domains: •newspapers: 2,179,800 words… |
…
|
ELRA-W0043
|
Details
|
|
PAROLE Portuguese Corpus - complete version |
57 Mb |
The parole Portuguese corpus contains approximately 3 million running words of European Portuguese distributed by Mediu… |
…
|
ELRA-W0024-01
|
Details
|
|
Persian 1984 corpus (Multext-East framework) |
5.9 Mb |
This corpus contains the Persian (Farsi) translation of a part of the novel “1984” (G. Orwell) annotated in the Multext… |
…
|
ELRA-W0054
|
Details
|
|
Persian Ezafe Construction Dataset |
– |
The Persian Ezafe Construction Dataset includes gold Ezafe tags in almost 30 thousand Persian sentences. The sentences … |
…
|
ELRA-W0315
|
Details
|
|
PKN Orlen Dataset (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0175
|
Details
|
|
Polish-English parallel corpus from the website "Business in Poland" (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0274
|
Details
|
|
Polish-English parallel corpus from the website "geoportal.gov.pl" (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0285
|
Details
|
|
Polish-English parallel corpus from the website of Public Employment Services in Poland (member of EURES network) (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0259
|
Details
|
|
Polish-English parallel corpus from the website of the Central Statistical Office (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0279
|
Details
|
|
Polish-English parallel corpus from the website of the Citizens Information Board (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0251
|
Details
|
|
Polish-English parallel corpus from the website of the ING Polish Art Foundation (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0261
|
Details
|
|
Polish-English parallel corpus from the website of the Institute of Mathematics of the Polish Academy of Sciences (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0283
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of Agriculture and Rural Development (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0252
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of Culture and National Heritage (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0257
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of Development (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0253
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of Digital Affairs (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0284
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of Digitization (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0255
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of Foreign Affairs (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0256
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of Justice (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0254
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of National Defence (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0250
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of Regional Development (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0282
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of Science and Higher Education (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0286
|
Details
|
|
Polish-English parallel corpus from the website of the Ministry of the Interior and Administration (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0258
|
Details
|
|
Polish-English parallel corpus from the website of the National Audiovisual Institute (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0289
|
Details
|
|
Polish-English parallel corpus from the website of the National Centre for Nuclear Research (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0278
|
Details
|
|
Polish-English parallel corpus from the website of the National Centre for Research and Development (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0280
|
Details
|
|
Polish-English parallel corpus from the website of the National Digital Archives (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0290
|
Details
|
|
Polish-English parallel corpus from the website of the National Science Centre (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0260
|
Details
|
|
Polish-English parallel corpus from the website of the National Security Bureau (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0262
|
Details
|
|
Polish-English parallel corpus from the website of the Office of the Commissioner for Human Rights (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0281
|
Details
|
|
Polish-English parallel corpus from the website of the Polish Tourism Organisation (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0276
|
Details
|
|
Polish-English parallel corpus from the website of the State Marine Accident Investigation Commission (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0288
|
Details
|
|
Polish-English parallel corpus from the website of the U.S. EMBASSY and CONSULATE IN POLAND (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0277
|
Details
|
|
Polish-English parallel corpus from the website "Polish Aid" (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0275
|
Details
|
|
Polish-English parallel corpus from the website "Science in Poland" (Processed) |
18 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0287
|
Details
|
|
Polish Food 4 & Food Policy Dataset (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0179
|
Details
|
|
Polish Food Dataset 2 (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0180
|
Details
|
|
Polish Food DataSet 3 (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0181
|
Details
|
|
Polish Food Dataset (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0177
|
Details
|
|
Polish Ministry of Foreign Affairs Historical Dataset (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0183
|
Details
|
|
Polish Ministry of Foreign Affairs Regional Dataset (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0182
|
Details
|
|
Polish Ministry of Foreign Affairs reports in EN and PL (Processed) |
3 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0235
|
Details
|
|
Polish Ministry of Foreign Affairs Youth 2011 Report (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0184
|
Details
|
|
Portuguese-English bilingual corpus from Legislation concerning the Portuguese Parliament (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0245
|
Details
|
|
Portuguese-English bilingual corpus from the Portuguese Constitution (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0246
|
Details
|
|
PRESS 65 |
6.3 Mb |
Språkdata has made available the first of its many Swedish corpora, PRESS 65. It consists of one million running words … |
…
|
ELRA-W0010
|
Details
|
|
PTPARL Corpus |
25 Mb |
The PTPARL Corpus contains 1,076 texts consisting of adapted transcriptions of the Portuguese Parliament sessions. The … |
…
|
ELRA-W0060
|
Details
|
|
Public Procurement Dataset 1 (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0187
|
Details
|
|
Public Procurement Dataset 2 (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0185
|
Details
|
|
Quaero Old Press Extended Named Entity corpus |
6.8 Gb |
The Quaero Old Press Extended Named Entity corpus consists of the manual annotation of 76 newspaper issues published in… |
…
|
ELRA-W0073
|
Details
|
|
Qualified POS Tagged Corpus |
66 Mb |
Monolingual corpus in a .txt format, produced by KAIST KORTERM, containing 1020000 eojeols (Korean terms) in Korean. Th… |
…
|
ELRA-W0034
|
Details
|
|
Quarterly Reports of the Parliamentary Budget Office (Hellenic Parliament) (Processed) |
15 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0243
|
Details
|
|
ROCO Romanian journalistic corpus |
729 Mb |
ROCO is a Romanian journalistic corpus containing approximately 7.1 million tokens, the number of types being 231,626. … |
…
|
ELRA-W0085
|
Details
|
|
Romanian-English corpus with studies, reports and statistical data in the field of culture from the National Institute for Cultural Research and Training website (Processed) |
8 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0270
|
Details
|
|
Romanian - English literature corpus (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0192
|
Details
|
|
Romanian – English New Criminal Procedure Code (Processed) |
4 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0170
|
Details
|
|
Romanian - English news corpus (Processed) |
63 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0194
|
Details
|
|
Romanian Ombudsman archive (Processed) |
5 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0206
|
Details
|
|
ROMBAC - Romanian balanced corpus |
1.1 Gb |
ROMBAC is a Romanian corpus containing equal shares of texts from 5 different genres: journalism, legalese, fiction, me… |
…
|
ELRA-W0088
|
Details
|
|
Secretariat-General parallel corpus SL-EN and EN-SL (part 1) (Processed) |
34 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0190
|
Details
|
|
Secretariat-General parallel corpus SL-EN and EN-SL (part 2) (Processed) |
39 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0191
|
Details
|
|
SIP Publications (Processed) |
7 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0306
|
Details
|
|
Slovenian-English corpus with statistical reports from the Statistical Office of the Republic of Slovenia website (Processed) |
9 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0267
|
Details
|
|
Spanish-English website parallel corpus (Processed) |
9 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0248
|
Details
|
|
Tagged text in French (MEMODATA) with rules of morphological disambiguation |
3.1 Gb |
More than 170 books (classical novels, legal texts...) are tagged with rules of morphological disambiguation. A tagged … |
…
|
ELRA-W0012
|
Details
|
|
Tagged text in French (MEMODATA) with typographic tags |
247 Mb |
More than 170 books (classical novels, legal texts...) are tagged with typographic tags. A tagged corpus of 50 books is… |
…
|
ELRA-W0011
|
Details
|
|
Text corpus of "Le Monde" |
3.9 Gb |
Electronic archiving of "Le Monde" articles started on 1 January 1987. Some 200 articles are added every day, and as of… |
…
|
ELRA-W0015
|
Details
|
|
The CINTIL Corpus – International Corpus of Portuguese |
20 Mb |
CINTIL-Corpus Internacional do Português is a linguistically interpreted written and spoken corpus of European Portugue… |
…
|
ELRA-W0050
|
Details
|
|
The Coimisineir Teanga Bilingual Corpus of Reference Documents (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0224
|
Details
|
|
The Coimisineir Teanga Bilingual Corpus of Reports and Press Releases (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0230
|
Details
|
|
The Croatian-English corpus with the nature protection strategy of Croatia (Processed) |
1 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0296
|
Details
|
|
The EMILLE/CIIL Corpus |
1.5 Gb |
The EMILLE/CIIL Corpus consists of three components: monolingual, parallel and annotated corpora. There are fourteen mo… |
…
|
ELRA-W0037
|
Details
|
|
The Gaois bilingual corpus of English-Irish legislation (Processed) |
26 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0223
|
Details
|
|
The Lancaster Corpus of Mandarin Chinese (LCMC) |
45 Mb |
The Lancaster Corpus of Mandarin Chinese (LCMC) is designed as a Chinese match for the FLOB and FROWN corpora for moder… |
…
|
ELRA-W0039
|
Details
|
|
TRAD Arabic-English Mailing lists Parallel corpus - Development set |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and a reference translation in English. The source texts are emails… |
…
|
ELRA-W0108
|
Details
|
|
TRAD Arabic-English Mailing lists Parallel corpus - Test set |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are email… |
…
|
ELRA-W0106
|
Details
|
|
TRAD Arabic-English Newspaper Parallel corpus - Test set 1 |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are artic… |
…
|
ELRA-W0099
|
Details
|
|
TRAD Arabic-English Parallel corpus of transcribed Broadcast News Speech |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are trans… |
…
|
ELRA-W0102
|
Details
|
|
TRAD Arabic-English Web domain (blogs) Parallel corpus |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in English. The source texts are blog … |
…
|
ELRA-W0104
|
Details
|
|
TRAD Arabic-French Mailing lists Parallel corpus - Development set |
1 Mb |
This is a parallel corpus of 10,000 words in Arabic and a reference translation in French. The source texts are emails … |
…
|
ELRA-W0107
|
Details
|
|
TRAD Arabic-French Mailing lists Parallel corpus - Test set |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are emails… |
…
|
ELRA-W0105
|
Details
|
|
TRAD Arabic-French Newspaper Parallel corpus - Test set 1 |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are articl… |
…
|
ELRA-W0098
|
Details
|
|
TRAD Arabic-French Newspaper Parallel corpus - Test set 2 |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and 2 reference translations in French. The source texts are articl… |
…
|
ELRA-W0100
|
Details
|
|
TRAD Arabic-French Parallel corpus of transcribed Broadcast News Speech |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are transc… |
…
|
ELRA-W0101
|
Details
|
|
TRAD Arabic-French Web domain (blogs) Parallel corpus |
2 Mb |
This is a parallel corpus of 10,000 words in Arabic and 4 reference translations in French. The source texts are blog a… |
…
|
ELRA-W0103
|
Details
|
|
TRAD Chinese-English Email Parallel corpus – Development Set |
1 Mb |
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and a reference translation in E… |
…
|
ELRA-W0113
|
Details
|
|
TRAD Chinese-English Email Parallel corpus – Test Set |
1 Mb |
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … |
…
|
ELRA-W0115
|
Details
|
|
TRAD Chinese-English News Articles Parallel corpus |
1 Mb |
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … |
…
|
ELRA-W0112
|
Details
|
|
TRAD Chinese-English Web domain (blogs) Parallel corpus |
1 Mb |
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … |
…
|
ELRA-W0110
|
Details
|
|
TRAD Chinese-French Email Parallel corpus – Development Set |
2 Mb |
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and a reference translation in F… |
…
|
ELRA-W0114
|
Details
|
|
TRAD Chinese-French Email Parallel corpus – Test Set |
2 Mb |
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … |
…
|
ELRA-W0116
|
Details
|
|
TRAD Chinese-French News Articles Parallel corpus |
2 Mb |
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … |
…
|
ELRA-W0111
|
Details
|
|
TRAD Chinese-French Web domain (blogs) Parallel corpus |
2 Mb |
This is a parallel corpus of 15,000 characters in Chinese (equivalent to 10,000 words) and 2 reference translations in … |
…
|
ELRA-W0109
|
Details
|
|
TRAD Pashto-English News Articles Parallel corpus |
602 Kb |
This is a parallel corpus, which contains 10,000 Pashto words translated into English by two different translators. The… |
…
|
ELRA-W0097
|
Details
|
|
TRAD Pashto-English Parallel corpus of transcribed Broadcast News Speech - Test data |
575 Kb |
This is a parallel corpus, which contains 10,000 Pashto words translated into English. The source texts come from 3 bro… |
…
|
ELRA-W0095
|
Details
|
|
TRAD Pashto-French News Articles Parallel corpus |
970 Kb |
This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The … |
…
|
ELRA-W0096
|
Details
|
|
TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Test data |
29 Mb |
This is a parallel corpus, which contains 10,000 Pashto words translated into French by two different translators. The … |
…
|
ELRA-W0094
|
Details
|
|
TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data |
473 Mb |
The corpus consists of the transcription of 106 hours of recordings in Pashto translated into French. The transcription… |
…
|
ELRA-W0093
|
Details
|
|
TRAD Pashto Monolingual text Corpus |
2.2 Gb |
This is a monolingual text corpus in Pashto. The corpus contains about 112,000,000 tokens collected from 46 different b… |
…
|
ELRA-W0092
|
Details
|
|
Training and test data for Arabizi detection and transliteration |
1 Mb |
The dataset is composed of two distinct resources:1) A collection of mixed English and Arabizi text intended to train a… |
…
|
ELRA-W0126
|
Details
|
|
Translation memories from The Ministry of Foreign Affairs of Norway (Processed) |
620 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0156
|
Details
|
|
Translation memory from Swedish National Audit Office (NAO) - Riksrevisionen (Processed) |
12 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0236
|
Details
|
|
Translations of Lithuanian legislation from Seimas of the Republic of Lithuania (Processed) |
70 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0165
|
Details
|
|
Trilingual Documents related to International Judicial Cooperation in Civil Matters (Greek-English-French) (Processed) |
2 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0307
|
Details
|
|
TSNLP (Test Suites for NLP Testing) |
4.5 Mb |
The TSNLP project (LRE 62-089) has produced a database of test suites for English, French and German containing over 4,… |
…
|
ELRA-W0013
|
Details
|
|
Venice Italian Treebank (VIT) |
149 Mb |
The VIT, Venice Italian Treebank is the effort of the collaboration of people working at the Laboratory of Computationa… |
…
|
ELRA-W0040
|
Details
|
|
Website of the President of the Republic of Lithuania (Processed) |
7 Mb |
This dataset has been created within the framework of the European Language Resource Coordination (ELRC) Connecting Eur… |
…
|
ELRA-W0160
|
Details
|
|
Wolverhampton Business English Corpus |
118 Mb |
The WBE was created by the Computational Linguistics Group at University of Wolverhampton through a funding from ELRA i… |
…
|
ELRA-W0028
|
Details
|
|