1 |
Lessons learned in quality management for online research software tools in linguistics ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Proceedings of the Workshop on Challenges in the Management of Large Corpora (CMLC-9) 2021. Limerick, 12 July 2021 (Online-Event) ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Proceedings of the LREC 2020: 8th Workshop on Challenges in the Management of Large Corpora (CMLC-8)
|
|
In: Proceedings of the LREC 2020: 8th Workshop on Challenges in the Management of Large Corpora (CMLC-8). Edited by: Bański, Piotr; Barbaresi, Adrien; Clematide, Simon; Kupietz, Marc; Lüngen, Harald; Pisetta, Ines (2020). Marseille, France: European Language Ressources Association. (2020)
|
|
BASE
|
|
Show details
|
|
8 |
Neues von KorAP
|
|
|
|
In: Enthalten in: Neues vom heutigen Deutsch (2019)
|
|
IDS Mannheim
|
|
9 |
Analyzing domain specific word embeddings for a large corpus of contemporary German. International Corpus Linguistics Conference, Cardiff, Wales, UK, July 22-26, 2019 ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Modelling Large Parallel Corpora: The Zurich Parallel Corpus Collection
|
|
|
|
In: Graën, Johannes; Kew, Tannon; Shaitarova, Anastassia; Volk, Martin (2019). Modelling Large Parallel Corpora: The Zurich Parallel Corpus Collection. In: Challenges in the Management of Large Corpora (CMLC-7), Cardiff, Wales, 22 July 2019 - 22 July 2019. (2019)
|
|
BASE
|
|
Show details
|
|
12 |
Visualisierung sprachlicher Daten ... : Visual Linguistics – Praxis – Tools ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
Increasing Interoperability for Embedding Corpus Annotation Pipelines in Wmatrix and other corpus retrieval tools
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Visual Linguistics : Plädoyer für ein neues Forschungsfeld
|
|
|
|
In: Bubenhofer, Noah (2018). Visual Linguistics : Plädoyer für ein neues Forschungsfeld. In: Bubenhofer, Noah; Kupietz, Marc. Visualisierung sprachlicher Daten : Visual Linguistics – Praxis – Tools. Heidelberg: Heidelberg University Publishing, 25-62. (2018)
|
|
BASE
|
|
Show details
|
|
15 |
Challenges in the Management of Large Corpora (CMLC-6)
|
|
In: Challenges in the Management of Large Corpora (CMLC-6). Edited by: Banski, Piotr; Kupietz, Marc; Barbaresi, Adrien; Biber, Hanno; Breiteneder, Evelyn; Clematide, Simon; Witt, Andreas (2018). Paris: European Language Resources Association (ELRA). (2018)
|
|
BASE
|
|
Show details
|
|
16 |
Möglichkeiten der Erforschung grammatischer Variation mithilfe von KorAP
|
|
|
|
In: Enthalten in: Grammatische Variation (2017)
|
|
IDS Mannheim
|
|
18 |
Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database
|
|
|
|
In: Graën, Johannes; Clematide, Simon; Volk, Martin (2016). Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database. In: 4th Workshop on the Challenges in the Management of Large Corpora, Portorož, 28 May 2016 - 28 May 2016, 20-23. (2016)
|
|
Abstract:
We present an approach for searching and exploring translation variants of multi-word units in large multiparallel corpora based on a relational database management system. Our web-based application Multilingwis, which allows for multilingual lookups of phrases and words in English, French, German, Italian and Spanish, is of interest to anybody who wants to quickly compare expressions across several languages, such as language learners without linguistic knowledge. In this paper, we focus on the technical aspects of how to represent and efficiently retrieve all occurrences that match the user’s query in one of five languages simultaneously with their translations into the other four languages. In order to identify such translations in our corpus of 220 million tokens in total, we use statistical sentence and word alignment. By using materialized views, composite indexes, and pre-planned search functions, our relational database management system handles large result sets with only moderate requirements to the underlying hardware. As our systematic evaluation on 200 search terms per language shows, we can achieve retrieval times below 1 second in 75 % of the cases for multi-word expressions.
|
|
Keyword:
000 Computer science; 410 Linguistics; Institute of Computational Linguistics; knowledge & systems
|
|
URL: https://www.zora.uzh.ch/id/eprint/124373/1/cmlc4.pdf https://doi.org/10.5167/uzh-124373 https://www.zora.uzh.ch/id/eprint/124373/ http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-CMLC_Proceedings.pdf
|
|
BASE
|
|
Hide details
|
|
19 |
Schriftliche und mündliche Korpora am IDS als Grundlage für die empirische Forschung
|
|
|
|
In: Sprachwissenschaft im Fokus (2015)
|
|
IDS Mannheim
|
|
|
|