Home
Catalogue search
Refine your search:
Keyword
Creator / Publisher:
Barbaresi, Adrien (3)
Berlin-Brandenburgische Akademie der Wissenschaften (BBAW) (2)
Austrian Academy of Sciences (OeAW) (1)
German Society for Computational Linguistics & Language Technology (1)
Würzner, Kay-Michael (1)
Year:
2015 (3)
Medium
Type
BLLDB-Access:
free (3)
subject to license (0)
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Hits 1 – 3 of 3
1
For a fistful of blogs: Discovery and comparative benchmarking of republishable German content
Würzner, Kay-Michael
;
Barbaresi, Adrien
. - 2015
BASE
Show details
2
Challenges in the linguistic exploitation of specialized republishable web corpora
Barbaresi, Adrien
In: RESAW conference 2015 ; https://halshs.archives-ouvertes.fr/halshs-01167324 ; RESAW conference 2015, Jun 2015, Aarhus, Denmark (2015)
BASE
Show details
3
Collection, Description, and Visualization of the German Reddit Corpus
Barbaresi, Adrien
In: 2nd Workshop on Natural Language Processing for Computer-Mediated Communication ; https://hal.archives-ouvertes.fr/hal-01207311 ; 2nd Workshop on Natural Language Processing for Computer-Mediated Communication, Sep 2015, Essen, Germany. pp.7-11 ; https://sites.google.com/site/nlp4cmc2015/program (2015)
Abstract:
International audience ; Reddit is a major social bookmarking and microblogging platform. An extensive dataset of Reddit comments has recently been made publicly available. I use a two-tiered filter to single out comments in German in order to build a linguistic corpus which is then tokenized and annotated. This article offers first insights of both nature and quality of data at the lexical level. Additionally, a visualization makes it possible to grasp the possible geographical distribution of German users of the platform.
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
;
[INFO.INFO-WB]Computer Science [cs]/Web
;
[SHS.LANGUE]Humanities and Social Sciences/Linguistics
;
Computer-mediated Communication
;
Information Visualization
;
Language Identification
;
Web corpus construction
URL:
https://hal.archives-ouvertes.fr/hal-01207311v2/file/Barbaresi_GermanRedditCorpus_2015_archive.pdf
https://hal.archives-ouvertes.fr/hal-01207311
https://hal.archives-ouvertes.fr/hal-01207311v2/document
BASE
Hide details
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
0
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
0
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
3
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern