1 |
Borderlands of text mapping: Experiments on Fontane's Brandenburg
|
|
|
|
In: Workshop INF-DH-2018 (Informatik und die Digital Humanities) ; https://hal.archives-ouvertes.fr/hal-01951880 ; Workshop INF-DH-2018 (Informatik und die Digital Humanities), Sep 2018, Berlin, Germany. ⟨10.18420/infdh2018-05⟩ (2018)
|
|
BASE
|
|
Show details
|
|
2 |
Data-Driven Identification of German Phrasal Compounds
|
|
|
|
In: Text, Speech, and Dialogue ; https://hal.archives-ouvertes.fr/hal-01575651 ; Kamil Ekštein; Václav Matoušek. Text, Speech, and Dialogue, 10415, Springer International Publishing, pp.192-200, 2017, Lecture Notes in Computer Science, 978-3-319-64205-5. ⟨10.1007/978-3-319-64206-2_22⟩ ; https://link.springer.com/bookseries/558 (2017)
|
|
BASE
|
|
Show details
|
|
3 |
Discriminating between Similar Languages using Weighted Subword Features
|
|
|
|
In: Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2017) ; https://hal.archives-ouvertes.fr/hal-01575656 ; Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2017), Association for Computational Linguistics (ACL), Apr 2017, Valence, Spain. pp.184-189, ⟨10.18653/v1/W17-1223⟩ ; http://ttg.uni-saarland.de/vardial2017/ (2017)
|
|
BASE
|
|
Show details
|
|
4 |
Bootstrapped OCR error detection for a less-resourced language variant
|
|
|
|
In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016) ; 13th Conference on Natural Language Processing (KONVENS 2016) ; https://hal.archives-ouvertes.fr/hal-01371689 ; 13th Conference on Natural Language Processing (KONVENS 2016), Sep 2016, Bochum, Germany. pp.21-26 ; https://www.linguistics.ruhr-uni-bochum.de/konvens16/ (2016)
|
|
BASE
|
|
Show details
|
|
5 |
An Unsupervised Morphological Criterion for Discriminating Similar Languages
|
|
|
|
In: 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2016) ; https://hal.archives-ouvertes.fr/hal-01575653 ; 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2016), Dec 2016, Osaka, Japan. pp.212-220 ; http://ttg.uni-saarland.de/vardial2016/ (2016)
|
|
BASE
|
|
Show details
|
|
6 |
Visualisierung von Ortsnamen im Deutschen Textarchiv
|
|
|
|
In: DHd 2016 ; https://halshs.archives-ouvertes.fr/halshs-01287931 ; DHd 2016, Mar 2016, Leipzig, Germany. pp.264-267 ; http://dhd2016.de/ (2016)
|
|
BASE
|
|
Show details
|
|
7 |
APIs in Digital Humanities: The Infrastructural Turn
|
|
|
|
In: Digital Humanities 2016 ; https://hal.archives-ouvertes.fr/hal-01348706 ; Digital Humanities 2016, Jul 2016, Cracovie, Poland. pp.93-96 ; http://dh2016.adho.org/ (2016)
|
|
BASE
|
|
Show details
|
|
8 |
Collection and Indexing of Tweets with a Geographical Focus
|
|
|
|
In: Tenth International Conference on Language Resources and Evaluation (LREC 2016) ; https://hal.archives-ouvertes.fr/hal-01323274 ; Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 2016, Portorož, Slovenia. pp.24-27 (2016)
|
|
Abstract:
International audience ; This paper introduces a Twitter corpus currently focused geographically in order to (1) test selection and collection processes for a given region and (2) find a suitable database to query, filter, and visualize the tweets. Due to access restrictions, it is not possible to retrieve all available tweets, which is why corpus construction implies a series of decisions described below. The corpus focuses on Austrian users, as data collection grounds on a two-tier detection process addressing corpus construction and user location issues. The emphasis lies on short messages whose sender mentions a place in Austria as his/her hometown or tweets from places located in Austria. The resulting user base is then queried and enlarged using focused crawling and random sampling, so that the corpus is refined and completed in the way of a monitor corpus. Its current volume is 21.7 million tweets from approximately 125,000 users. The tweets are indexed using Elasticsearch and queried via the Kibana frontend, which allows for queries on metadata as well as for the visualization of geolocalized tweets (currently about 3.3% of the collection).
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-WB]Computer Science [cs]/Web; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; Computer-Mediated Communication; Database Solutions; Visualization; Web Corpus Construction
|
|
URL: https://hal.archives-ouvertes.fr/hal-01323274v3/document https://hal.archives-ouvertes.fr/hal-01323274 https://hal.archives-ouvertes.fr/hal-01323274v3/file/Barbaresi_CMLC2016_Twitter_archive.pdf
|
|
BASE
|
|
Hide details
|
|
9 |
Extraction and Visualization of Toponyms in Diachronic Text Corpora
|
|
|
|
In: Digital Humanities 2016 ; https://hal.archives-ouvertes.fr/hal-01348696 ; Digital Humanities 2016, Jul 2016, Cracovie, Poland. pp.732-734 ; http://dh2016.adho.org/ (2016)
|
|
BASE
|
|
Show details
|
|
10 |
Efficient construction of metadata-enhanced web corpora
|
|
|
|
In: Proceedings of the 10th Web as Corpus Workshop ; 10th Web as Corpus Workshop ; https://hal.archives-ouvertes.fr/hal-01371704 ; 10th Web as Corpus Workshop, Association for Computational Linguistics (ACL SIGWAC), Aug 2016, Berlin, Germany. pp.7-16, ⟨10.18653/v1/W16-2602⟩ (2016)
|
|
BASE
|
|
Show details
|
|
11 |
Collection, Description, and Visualization of the German Reddit Corpus
|
|
|
|
In: 2nd Workshop on Natural Language Processing for Computer-Mediated Communication ; https://hal.archives-ouvertes.fr/hal-01207311 ; 2nd Workshop on Natural Language Processing for Computer-Mediated Communication, Sep 2015, Essen, Germany. pp.7-11 ; https://sites.google.com/site/nlp4cmc2015/program (2015)
|
|
BASE
|
|
Show details
|
|
|
|