DE eng

Search in the Catalogues and Directories

Hits 1 – 6 of 6

1
Spektrum Patholinguistik = Schwerpunktthema: Hören – Zuhören – Dazugehören : Sprachtherapie bei Hörstörungen und Cochlea-Implantat
Aust, Gottfried [Verfasser]; Heinemann, Steffi [Verfasser]; Hennies, Johannes [Verfasser]. - 2014
DNB Subject Category Language
Show details
2
For a fistful of blogs: Discovery and comparative benchmarking of republishable German content
Barbaresi, Adrien [Verfasser]; Würzner, Kay-Michael [Verfasser]. - Hildesheim : Universität Hildesheim, 2014
DNB Subject Category Language
Show details
3
Altersgruppeneffekte in childLex
Heister, Julian [Verfasser]; Würzner, Kay-Michael [Verfasser]; Schroeder, Sascha [Verfasser]. - Potsdam : Universität Potsdam, 2014
DNB Subject Category Language
Show details
4
For a fistful of blogs: Discovery and comparative benchmarking of republishable German content
In: KONVENS 2014, NLP4CMC workshop ; https://hal.archives-ouvertes.fr/hal-01083750 ; KONVENS 2014, NLP4CMC workshop, Oct 2014, Hildesheim, Germany. pp.2-10 ; http://www.uni-hildesheim.de/konvens2014/ (2014)
Abstract: International audience ; We introduce two corpora gathered on the web and related to computer-mediated communication: blog posts and blog comments. In order to build such corpora, we addressed following issues: website discovery and crawling, content extraction constraints, and text quality assessment. The blogs were manually classified as to their license and content type. Our results show that it is possible to find blogs in German under Creative Commons license, and that it is possible to perform text extraction and linguistic annotation efficiently enough to allow for a comparison with more traditional text types such as newspaper corpora and subtitles. The comparison gives insights on distributional properties of the processed web texts on token and type level. For example, quantitative analysis reveals that blog posts are close to written language, while comments are slightly closer to spoken language.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-WB]Computer Science [cs]/Web; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; ACM: I.: Computing Methodologies/I.2: ARTIFICIAL INTELLIGENCE/I.2.7: Natural Language Processing; ACM: I.: Computing Methodologies/I.7: DOCUMENT AND TEXT PROCESSING/I.7.5: Document Capture; Blogosphere; Computer-Mediated Communication CMC; Corpus Construction; Creative Commons; Quality Assessment; Text Extraction; Visualization; Web Crawling
URL: https://hal.archives-ouvertes.fr/hal-01083750
https://hal.archives-ouvertes.fr/hal-01083750/document
https://hal.archives-ouvertes.fr/hal-01083750/file/Barbaresi-W%C3%BCrzner_Fistful-of-blogs_2014.pdf
BASE
Hide details
5
Altersgruppeneffekte in childLex
BASE
Show details
6
For a fistful of blogs: Discovery and comparative benchmarking of republishable German content
BASE
Show details

Catalogues
0
0
0
0
3
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
3
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern