1 |
Borderlands of text mapping: Experiments on Fontane's Brandenburg
|
|
|
|
In: Workshop INF-DH-2018 (Informatik und die Digital Humanities) ; https://hal.archives-ouvertes.fr/hal-01951880 ; Workshop INF-DH-2018 (Informatik und die Digital Humanities), Sep 2018, Berlin, Germany. ⟨10.18420/infdh2018-05⟩ (2018)
|
|
BASE
|
|
Show details
|
|
2 |
Data-Driven Identification of German Phrasal Compounds
|
|
|
|
In: Text, Speech, and Dialogue ; https://hal.archives-ouvertes.fr/hal-01575651 ; Kamil Ekštein; Václav Matoušek. Text, Speech, and Dialogue, 10415, Springer International Publishing, pp.192-200, 2017, Lecture Notes in Computer Science, 978-3-319-64205-5. ⟨10.1007/978-3-319-64206-2_22⟩ ; https://link.springer.com/bookseries/558 (2017)
|
|
Abstract:
Proceedings of the 20th International Conference, TSD 2017, Prague, Czech Republic, August 27-31, 2017 ; International audience ; We present a method to identify and document a phenomenon on which there is very little empirical data: German phrasal compounds occurring in the form of as a single token (without punctuation between their components). Relying on linguistic criteria, our approach implies to have an operational notion of compounds which can be systematically applied as well as (web) corpora which are large and diverse enough to contain rarely seen phenomena. The method is based on word segmentation and morphological analysis, it takes advantage of a data-driven learning process. Our results show that coarse-grained identification of phrasal compounds is best performed with empirical data, whereas fine-grained detection could be improved with a combination of rule-based and frequency-based word lists. Along with the characteristics of web texts, the or-thographic realizations seem to be linked to the degree of expressivity.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [SHS.LANGUE]Humanities and Social Sciences/Linguistics; ACM: H.: Information Systems/H.3: INFORMATION STORAGE AND RETRIEVAL/H.3.1: Content Analysis and Indexing/H.3.1.1: Dictionaries; ACM: H.: Information Systems/H.3: INFORMATION STORAGE AND RETRIEVAL/H.3.1: Content Analysis and Indexing/H.3.1.3: Linguistic processing; ACM: H.: Information Systems/H.3: INFORMATION STORAGE AND RETRIEVAL/H.3.7: Digital Libraries/H.3.7.0: Collection; corpus linguistics; morphological analysis; web corpora; word segmentation
|
|
URL: https://hal.archives-ouvertes.fr/hal-01575651 https://hal.archives-ouvertes.fr/hal-01575651/document https://hal.archives-ouvertes.fr/hal-01575651/file/Barbaresi%26Hein_2017_Data-driven-Identification-of-German-Phrase-Compounds.pdf https://doi.org/10.1007/978-3-319-64206-2_22
|
|
BASE
|
|
Hide details
|
|
3 |
Discriminating between Similar Languages using Weighted Subword Features
|
|
|
|
In: Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2017) ; https://hal.archives-ouvertes.fr/hal-01575656 ; Fourth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2017), Association for Computational Linguistics (ACL), Apr 2017, Valence, Spain. pp.184-189, ⟨10.18653/v1/W17-1223⟩ ; http://ttg.uni-saarland.de/vardial2017/ (2017)
|
|
BASE
|
|
Show details
|
|
4 |
Bootstrapped OCR error detection for a less-resourced language variant
|
|
|
|
In: Proceedings of the 13th Conference on Natural Language Processing (KONVENS 2016) ; 13th Conference on Natural Language Processing (KONVENS 2016) ; https://hal.archives-ouvertes.fr/hal-01371689 ; 13th Conference on Natural Language Processing (KONVENS 2016), Sep 2016, Bochum, Germany. pp.21-26 ; https://www.linguistics.ruhr-uni-bochum.de/konvens16/ (2016)
|
|
BASE
|
|
Show details
|
|
5 |
An Unsupervised Morphological Criterion for Discriminating Similar Languages
|
|
|
|
In: 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2016) ; https://hal.archives-ouvertes.fr/hal-01575653 ; 3rd Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2016), Dec 2016, Osaka, Japan. pp.212-220 ; http://ttg.uni-saarland.de/vardial2016/ (2016)
|
|
BASE
|
|
Show details
|
|
6 |
Visualisierung von Ortsnamen im Deutschen Textarchiv
|
|
|
|
In: DHd 2016 ; https://halshs.archives-ouvertes.fr/halshs-01287931 ; DHd 2016, Mar 2016, Leipzig, Germany. pp.264-267 ; http://dhd2016.de/ (2016)
|
|
BASE
|
|
Show details
|
|
7 |
APIs in Digital Humanities: The Infrastructural Turn
|
|
|
|
In: Digital Humanities 2016 ; https://hal.archives-ouvertes.fr/hal-01348706 ; Digital Humanities 2016, Jul 2016, Cracovie, Poland. pp.93-96 ; http://dh2016.adho.org/ (2016)
|
|
BASE
|
|
Show details
|
|
8 |
Collection and Indexing of Tweets with a Geographical Focus
|
|
|
|
In: Tenth International Conference on Language Resources and Evaluation (LREC 2016) ; https://hal.archives-ouvertes.fr/hal-01323274 ; Tenth International Conference on Language Resources and Evaluation (LREC 2016), May 2016, Portorož, Slovenia. pp.24-27 (2016)
|
|
BASE
|
|
Show details
|
|
9 |
Extraction and Visualization of Toponyms in Diachronic Text Corpora
|
|
|
|
In: Digital Humanities 2016 ; https://hal.archives-ouvertes.fr/hal-01348696 ; Digital Humanities 2016, Jul 2016, Cracovie, Poland. pp.732-734 ; http://dh2016.adho.org/ (2016)
|
|
BASE
|
|
Show details
|
|
10 |
Efficient construction of metadata-enhanced web corpora
|
|
|
|
In: Proceedings of the 10th Web as Corpus Workshop ; 10th Web as Corpus Workshop ; https://hal.archives-ouvertes.fr/hal-01371704 ; 10th Web as Corpus Workshop, Association for Computational Linguistics (ACL SIGWAC), Aug 2016, Berlin, Germany. pp.7-16, ⟨10.18653/v1/W16-2602⟩ (2016)
|
|
BASE
|
|
Show details
|
|
11 |
Collection, Description, and Visualization of the German Reddit Corpus
|
|
|
|
In: 2nd Workshop on Natural Language Processing for Computer-Mediated Communication ; https://hal.archives-ouvertes.fr/hal-01207311 ; 2nd Workshop on Natural Language Processing for Computer-Mediated Communication, Sep 2015, Essen, Germany. pp.7-11 ; https://sites.google.com/site/nlp4cmc2015/program (2015)
|
|
BASE
|
|
Show details
|
|
|
|