2 |
Rule-based Morphological Inflection Improves Neural Terminology Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Rule-based Morphological Inflection Improves Neural Terminology Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Evaluating the Evaluation Metrics for Style Transfer: A Case Study in Multilingual Formality Transfer ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Beyond Noise: Mitigating the Impact of Fine-grained Semantic Divergences on Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
The UMD Submission to the Explainable MT Quality Estimation Shared Task: Combining Explanation Models with Sequence Labeling ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Evaluating the Evaluation Metrics for Style Transfer: A Case Study in Multilingual Formality Transfer ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
How Does Distilled Data Complexity Impact the Quality and Confidence of Non-Autoregressive Machine Translation? ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
EDITOR: an Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
A Non-Autoregressive Edit-Based Approach to Controllable Text Simplification ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Incorporating Terminology Constraints in Automatic Post-Editing ...
|
|
|
|
BASE
|
|
Show details
|
|
13 |
EDITOR: an Edit-Based Transformer with Repositioning for Neural Machine Translation with Soft Lexical Constraints ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Controlling Neural Machine Translation Formality with Synthetic Supervision ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Controlling Text Complexity in Neural Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Identifying Semantic Divergences Across Languages
|
|
|
|
Abstract:
Cross-lingual resources such as parallel corpora and bilingual dictionaries are cornerstones of multilingual natural language processing (NLP). They have been used to study the nature of translation, train automatic machine translation systems, as well as to transfer models across languages for an array of NLP tasks. However, the majority of work in cross-lingual and multilingual NLP assumes that translations recorded in these resources are semantically equivalent. This is often not the case---words and sentences that are considered to be translations of each other frequently divergein meaning, often in systematic ways. In this thesis, we focus on such mismatches in meaning in text that we expect to be aligned across languages. We term such mismatches as cross-lingual semantic divergences. The core claim of this thesis is that translation is not always meaning preserving which leads to cross-lingual semantic divergences that affect multilingual NLP tasks. Detecting such divergences requires ways of directly characterizing differences in meaning across languages through novel cross-lingual tasks, as well as models that account for translation ambiguity and do not rely on expensive, task-specific supervision. We support this claim through three main contributions. First, we show that a large fraction of data in multilingual resources (such as parallel corpora and bilingual dictionaries) is identified as semantically divergent by human annotators. Second, we introduce cross-lingual tasks that characterize differences in word meaning across languages by identifying the semantic relation between two words. We also develop methods to predict such semantic relations, as well as a model to predict whether sentences in different languages have the same meaning. Finally, we demonstrate the impact of divergences by applying the methods developed in the previous sections to two downstream tasks. We first show that our model for identifying semantic relations between words helps in separating equivalent word translations from divergent translations in the context of bilingual dictionary induction, even when the two words are close in meaning. We also show that identifying and filtering semantic divergences in parallel data helps in training a neural machine translation system twice as fast without sacrificing quality.
|
|
Keyword:
Computer science; lexical semantics; Linguistics; machine learning; machine translation; multilingual nlp; natural language processing
|
|
URL: http://hdl.handle.net/1903/25448 https://doi.org/10.13016/rymp-ymgo
|
|
BASE
|
|
Hide details
|
|
17 |
Formality Style Transfer Within and Across Languages with Limited Supervision
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Identifying Semantic Divergences in Parallel Text without Annotations ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Bi-Directional Neural Machine Translation with Synthetic Parallel Data ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Multi-Task Neural Models for Translating Between Styles Within and Across Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|