68 |
Proceedings of the Workshop on Challenges in the Management of Large Corpora and Big Data and Natural Language Processing (CMLC-5+BigNLP) 2017 including the papers from the Web-as-Corpus (WAC-XI) guest section. Birmingham, 24 July 2017
|
|
|
|
DNB Subject Category Language
|
|
Show details
|
|
70 |
Align and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological Reinflection ...
|
|
|
|
BASE
|
|
Show details
|
|
71 |
Align and Copy: UZH at SIGMORPHON 2017 Shared Task for Morphological Reinflection ...
|
|
|
|
BASE
|
|
Show details
|
|
72 |
CLUZH at VarDial GDI 2017: Testing a Variety of Machine Learning Tools for the Classification of Swiss German Dialects ...
|
|
|
|
BASE
|
|
Show details
|
|
73 |
A multilingual gold-standard corpus for biomedical concept recognition: the Mantra GSC
|
|
|
|
BASE
|
|
Show details
|
|
74 |
Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanitie
|
|
|
|
In: Clematide, Simon; Meraner, Isabel; Bubenhofer, Noah; Volk, Martin (2017). Lessons from a Massive Open Online Course (MOOC) on Natural Language Processing for Digital Humanitie. In: Teaching NLP for Digital Humanitie, Berlin, 12 September 2017 - 12 September 2017, 17-22. (2017)
|
|
BASE
|
|
Show details
|
|
77 |
Crowdsourcing Swiss Dialect Transcriptions for Assessing Factors in Writing Variations ...
|
|
|
|
BASE
|
|
Show details
|
|
78 |
Multilingwis – A Multilingual Search Tool for Multi-Word Units in Multiparallel Corpora ...
|
|
|
|
BASE
|
|
Show details
|
|
79 |
Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database
|
|
|
|
In: Graën, Johannes; Clematide, Simon; Volk, Martin (2016). Efficient Exploration of Translation Variants in Large Multiparallel Corpora Using a Relational Database. In: 4th Workshop on the Challenges in the Management of Large Corpora, Portorož, 28 May 2016 - 28 May 2016, 20-23. (2016)
|
|
Abstract:
We present an approach for searching and exploring translation variants of multi-word units in large multiparallel corpora based on a relational database management system. Our web-based application Multilingwis, which allows for multilingual lookups of phrases and words in English, French, German, Italian and Spanish, is of interest to anybody who wants to quickly compare expressions across several languages, such as language learners without linguistic knowledge. In this paper, we focus on the technical aspects of how to represent and efficiently retrieve all occurrences that match the user’s query in one of five languages simultaneously with their translations into the other four languages. In order to identify such translations in our corpus of 220 million tokens in total, we use statistical sentence and word alignment. By using materialized views, composite indexes, and pre-planned search functions, our relational database management system handles large result sets with only moderate requirements to the underlying hardware. As our systematic evaluation on 200 search terms per language shows, we can achieve retrieval times below 1 second in 75 % of the cases for multi-word expressions.
|
|
Keyword:
000 Computer science; 410 Linguistics; Institute of Computational Linguistics; knowledge & systems
|
|
URL: https://www.zora.uzh.ch/id/eprint/124373/1/cmlc4.pdf https://doi.org/10.5167/uzh-124373 https://www.zora.uzh.ch/id/eprint/124373/ http://www.lrec-conf.org/proceedings/lrec2016/workshops/LREC2016Workshop-CMLC_Proceedings.pdf
|
|
BASE
|
|
Hide details
|
|
80 |
Multilingwis – A Multilingual Search Tool for Multi-Word Units in Multiparallel Corpora
|
|
|
|
In: Clematide, Simon; Graën, Johannes; Volk, Martin (2016). Multilingwis – A Multilingual Search Tool for Multi-Word Units in Multiparallel Corpora. In: Corpas Pastor, Gloria. Computerised and Corpus-based Approaches to Phraseology: Monolingual and Multilingual Perspectives/Fraseología computacional y basada en corpus: perspectivas monolingües y multilingües. Geneva: Tradulex, n/a. (2016)
|
|
BASE
|
|
Show details
|
|
|
|