Page: 1... 5 6 7 8 9 10 11 12 13... 90
163 |
CEHugeWebCorpus
|
|
|
|
Abstract:
This corpus was originally created for performance testing (server infrastructure CorpusExplorer - see: diskurslinguistik.net / diskursmonitor.de). It includes the filtered database (German texts only) of CommonCrawl (as of March 2018). First, the URLs were filtered according to their top-level domain (de, at, ch). Then the texts were classified using NTextCat and only uniquely German texts were included in the corpus. The texts were then annotated using TreeTagger (token, lemma, part-of-speech). 2.58 million documents - 232.87 million sentences - 3.021 billion tokens. You can use CorpusExplorer (http://hdl.handle.net/11234/1-2634) to convert this data into various other corpus formats (XML, JSON, Weblicht, TXM and many more).
|
|
Keyword:
corpus; CorpusExplorer; German; Germanistik; web corpora; Web corpus
|
|
URL: http://hdl.handle.net/11372/LRT-2638
|
|
BASE
|
|
Hide details
|
|
164 |
Création de ressources lexicographiques Français–Slovène d'aide à la traduction spécialisée
|
|
|
|
In: Lexikos; Vol. 30 (2020) ; 2224-0039 (2020)
|
|
BASE
|
|
Show details
|
|
165 |
A Critical Evaluation of Three Sesotho Dictionaries
|
|
|
|
In: Lexikos; Vol. 30 (2020) ; 2224-0039 (2020)
|
|
BASE
|
|
Show details
|
|
167 |
Confini e sconfinamenti negli archivi testuali e nei vocabolari elettronici ...
|
|
|
|
BASE
|
|
Show details
|
|
168 |
Языковая политика и языковые ресурсы в китайском интернете ... : Language policy and language resources on the Chinese Internet ...
|
|
|
|
BASE
|
|
Show details
|
|
169 |
Automatização no diagnóstico de nível de língua ; anotação e versatilidade dos recursos para PLE
|
|
|
|
BASE
|
|
Show details
|
|
170 |
Variation in Female and Male Dialogue in Buffy the Vampire Slayer : A Multi-dimensional Analysis
|
|
|
|
In: Dissertations and Theses (2020)
|
|
BASE
|
|
Show details
|
|
171 |
A Computer Science Academic Vocabulary List
|
|
|
|
In: Dissertations and Theses (2020)
|
|
BASE
|
|
Show details
|
|
172 |
SSHOC Workshop: Sharing Datasets of Pathological Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
173 |
SSHOC Workshop: Sharing Datasets of Pathological Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
174 |
SSHOC Workshop: Sharing Datasets of Pathological Speech ...
|
|
|
|
BASE
|
|
Show details
|
|
175 |
Клинико-лингвистические характеристики психических нарушений при ВИЧ-инфицировании ... : Clinical and linguistic characteristics of mental disorders in HIV infection ...
|
|
|
|
BASE
|
|
Show details
|
|
178 |
Utilizing Geographical Textual Analysis to study the Evolution of Archaeological Thinking ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1... 5 6 7 8 9 10 11 12 13... 90
|
|