1 |
Grenzüberschreitendes Textmining von Historischen Zeitungen - Das impresso-Projekt zwischen Text- und Bildverarbeitung, Design und Geschichtswissenschaft ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Grenzüberschreitendes Textmining von Historischen Zeitungen - Das impresso-Projekt zwischen Text- und Bildverarbeitung, Design und Geschichtswissenschaft ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Leveraging Cognitive Processing Signals for Natural Language Understanding
|
|
|
|
Abstract:
In this thesis, we aim to narrow the gap between human language processing and computational language processing. Natural language processing (NLP) models are imperfect and lack intricate capabilities that humans access automatically when processing speech or reading text. Human language processing signals can be leveraged to increase the performance of machine learning (ML) models and to pursue explanatory research for a better understanding of the differences between human and machine language processing. In particular, the contributions of this thesis are threefold: 1. We compile the Zurich Cognitive Language Processing Corpus (ZuCo), a dataset of simultaneous eye tracking and electroencephalography (EEG) recordings from participants reading natural sentences from real-world texts. When we read, our brain processes language and generates cognitive processing signals such as gaze patterns and brain activity. ZuCo includes data of 30 English native speakers, each reading 700-1,100 sentences. This corpus represents a valuable resource for cognitively-inspired NLP. 2. We leverage these cognitive signals to augment ML models for NLP. Compared to purely text-based models, we show consistent improvements across a range of tasks and for both eye tracking and brain activity data. We further explore two of the main challenges in this area: (i) decoding brain activity for language processing and (ii) dealing with limited training data to eliminate the need for recorded cognitive signals at test time. 3. We evaluate the cognitive plausibility of computational language models, the cornerstones of state-of-the-art NLP. We develop CogniVal, the first openly available framework for evaluating English word embeddings based on cognitive lexical semantics. Specifically, embeddings are evaluated by their performance at predicting a wide range of cognitive data sources recorded during language comprehension, including multiple eye tracking datasets and brain activity recordings such as electroencephalography and functional magnetic resonance imaging.
|
|
Keyword:
Cognitive science; computer science; Data processing; info:eu-repo/classification/ddc/004; machine learning; natural language processing
|
|
URL: https://doi.org/10.3929/ethz-b-000472454 https://hdl.handle.net/20.500.11850/472454
|
|
BASE
|
|
Hide details
|
|
5 |
Abstractive Document Summarization in High and Low Resource Settings
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Benchmarking Data-driven Automatic Text Simplification for German
|
|
|
|
In: Säuberli, Andreas; Ebling, Sarah; Volk, Martin (2020). Benchmarking Data-driven Automatic Text Simplification for German. In: Gala, Nuria; Wilkens, Rodrigo. Proceedings of the 1st Workshop on Tools and Resources to Empower People with REAding DIfficulties (READI). Marseille: European Language Resources Association, 41-48. (2020)
|
|
BASE
|
|
Show details
|
|
9 |
Modelling large parallel corpora. The Zurich Parallel Corpus Collection ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
An Empirical Analysis of Linguistic, Typographic, and Structural Features in Simplified German Texts ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Modelling Large Parallel Corpora: The Zurich Parallel Corpus Collection
|
|
|
|
In: Graën, Johannes; Kew, Tannon; Shaitarova, Anastassia; Volk, Martin (2019). Modelling Large Parallel Corpora: The Zurich Parallel Corpus Collection. In: Challenges in the Management of Large Corpora (CMLC-7), Cardiff, Wales, 22 July 2019 - 22 July 2019. (2019)
|
|
BASE
|
|
Show details
|
|
12 |
Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain
|
|
|
|
In: Läubli, Samuel; Amrhein, Chantal; Düggelin, Patrick; Gonzalez, Beatriz; Zwahlen, Alena; Volk, Martin (2019). Post-editing Productivity with Neural Machine Translation: An Empirical Assessment of Speed and Quality in the Banking and Finance Domain. In: Machine Translation Summit XVII, Dublin, Ireland, 19 August 2019 - 23 August 2019, 267-272. (2019)
|
|
BASE
|
|
Show details
|
|
13 |
Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
A multilingual gold standard for translation spotting of German compounds and their corresponding multiword units in English, French, Italian and Spanish ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Has Machine Translation Achieved Human Parity? A Case for Document-level Evaluation ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus
|
|
|
|
In: Clematide, Simon; Furrer, Lenz; Volk, Martin (2018). Crowdsourcing the OCR Ground Truth of a German and French Cultural Heritage Corpus. Journal for Language Technology and Computational Linguistics (JLCL), 33(1):25-47. (2018)
|
|
BASE
|
|
Show details
|
|
18 |
Parallel Corpora, Terminology Extraction and Machine Translation
|
|
|
|
In: Volk, Martin (2018). Parallel Corpora, Terminology Extraction and Machine Translation. In: 16. DTT-Symposion. Terminologie und Text(e), Mannheim, 22 March 2018 - 24 March 2018, 3-14. (2018)
|
|
BASE
|
|
Show details
|
|
19 |
Annotation, exploitation and evaluation of parallel corpora: TC3 I
|
|
|
|
In: Language Science Press; (2017)
|
|
BASE
|
|
Show details
|
|
20 |
Annotation, exploitation and evaluation of parallel corpora: TC3 I
|
|
|
|
In: Language Science Press; (2017)
|
|
BASE
|
|
Show details
|
|
|
|