21 |
Simplification Using Paraphrases and Context-Based Lexical Substitution
|
|
|
|
In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ; https://hal.archives-ouvertes.fr/hal-01838519 ; Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics, Jun 2018, Nouvelle Orléans, United States (2018)
|
|
BASE
|
|
Show details
|
|
22 |
Automated Paraphrase Lattice Creation for HyTER Machine Translation Evaluation
|
|
|
|
In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ; https://hal.archives-ouvertes.fr/hal-01838521 ; Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics Jun 2018, Nouvelle Orléans, United States (2018)
|
|
BASE
|
|
Show details
|
|
23 |
Comparing Constraints for Taxonomic Organization
|
|
|
|
In: Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ; https://hal.archives-ouvertes.fr/hal-01838520 ; Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Association for Computational Linguistics Jun 2018, Nouvelle Orléans, United States (2018)
|
|
BASE
|
|
Show details
|
|
24 |
Mapping the Paraphrase Database to WordNet
|
|
|
|
In: Conference on Lexical and Computational Semantics ; https://hal.archives-ouvertes.fr/hal-01838527 ; Conference on Lexical and Computational Semantics, Aug 2017, Vancouver, Canada (2017)
|
|
BASE
|
|
Show details
|
|
25 |
Learning Antonyms with Paraphrases and a Morphology-aware Neural Network
|
|
|
|
In: Conference on Lexical and Computational Semantics ; https://hal.archives-ouvertes.fr/hal-01838526 ; Conference on Lexical and Computational Semantics, Aug 2017, Vancouver, Canada (2017)
|
|
BASE
|
|
Show details
|
|
26 |
Word Sense Filtering Improves Embedding-Based Lexical Substitution
|
|
|
|
In: Conference of the European Chapter of the Association for Computational Linguistics ; https://hal.archives-ouvertes.fr/hal-01838524 ; Conference of the European Chapter of the Association for Computational Linguistics , Apr 2017, Valencia, Spain (2017)
|
|
BASE
|
|
Show details
|
|
27 |
Learning Translations via Matrix Completion
|
|
|
|
In: Conference on Empirical Methods in Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-01838532 ; Conference on Empirical Methods in Natural Language Processing, Sep 2017, Copenhagen, Denmark (2017)
|
|
BASE
|
|
Show details
|
|
28 |
KnowYourNyms? A Game of Semantic Relationships
|
|
|
|
In: Conference on Empirical Methods in Natural Language Processing ; https://hal.archives-ouvertes.fr/hal-01838528 ; Conference on Empirical Methods in Natural Language Processing, Sep 2017, Copenhagen, Denmark (2017)
|
|
BASE
|
|
Show details
|
|
29 |
Use of Modality and Negation in Semantically-Informed Syntactic MT ...
|
|
|
|
BASE
|
|
Show details
|
|
31 |
FEATURE-DRIVEN QUESTION ANSWERING WITH NATURAL LANGUAGE ALIGNMENT
|
|
|
|
BASE
|
|
Show details
|
|
32 |
Using Comparable Corpora to Augment Statistical Machine Translation Models in Low Resource Settings
|
|
|
|
Abstract:
Previously, statistical machine translation (SMT) models have been estimated from parallel corpora, or pairs of translated sentences. In this thesis, we directly incorporate comparable corpora into the estimation of end-to-end SMT models. In contrast to parallel corpora, comparable corpora are pairs of monolingual corpora that have some cross-lingual similarities, for example topic or publication date, but that do not necessarily contain any direct translations. Comparable corpora are more readily available in large quantities than parallel corpora, which require significant human effort to compile. We use comparable corpora to estimate machine translation model parameters and show that doing so improves performance in settings where a limited amount of parallel data is available for training. The major contributions of this thesis are the following: * We release ‘language packs’ for 151 human languages, which include bilingual dictionaries, comparable corpora of Wikipedia document pairs, comparable corpora of time-stamped news text that we harvested from the web, and, for non-roman script languages, dictionaries of name pairs, which are likely to be transliterations. * We present a novel technique for using a small number of example word translations to learn a supervised model for bilingual lexicon induction which takes advantage of a wide variety of signals of translation equivalence that can be estimated over comparable corpora. * We show that using comparable corpora to induce new translations and estimate new phrase table feature functions improves end-to-end statistical machine translation performance for low resource language pairs as well as domains. * We present a novel algorithm for composing multiword phrase translations from multiple unigram translations and then use comparable corpora to prune the large space of hypothesis translations. We show that these induced phrase translations improve machine translation performance beyond that of component unigrams. This thesis focuses on critical low resource machine translation settings, where insufficient parallel corpora exist for training statistical models. We experiment with both low resource language pairs and low resource domains of text. We present results from our novel error analysis methodology, which show that most translation errors in low resource settings are due to unseen source language words and phrases and unseen target language translations. We also find room for fixing errors due to how different translations are weighted, or scored, in the models. We target both error types; we use comparable corpora to induce new word and phrase translations and estimate novel translation feature scores. Our experiments show that augmenting baseline SMT systems with new translations and features estimated over comparable corpora improves translation performance significantly. Additionally, our techniques expand the applicability of statistical machine translation to those language pairs for which zero parallel text is available.
|
|
Keyword:
machine translation; natural language processing
|
|
URL: http://jhir.library.jhu.edu/handle/1774.2/38018
|
|
BASE
|
|
Hide details
|
|
33 |
Bucking the Trend: Large-Scale Cost-Focused Active Learning for Statistical Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
34 |
Fisher and CALLHOME Spanish--English Speech Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
35 |
Semantically-Informed Syntactic Machine Translation: A Tree-Grafting Approach ...
|
|
|
|
BASE
|
|
Show details
|
|
36 |
Dirt cheap web-scale parallel text from the Common Crawl ...
|
|
|
|
BASE
|
|
Show details
|
|
37 |
Dirt cheap web-scale parallel text from the Common Crawl
|
|
|
|
In: Smith, Jason R; Saint-Amand, Herve; Plamada, Magdalena; Koehn, Philipp; Callison-Burch, Chris; Lopez, Adam (2013). Dirt cheap web-scale parallel text from the Common Crawl. In: 51st Annual Meeting of the Association for Computational Linguistics, Sofia, Bulgaria, August 2013. Association for Computational Linguistics, 1374-1383. (2013)
|
|
BASE
|
|
Show details
|
|
38 |
Use of Modality and Negation in Semantically-Informed Syntactic MT
|
|
|
|
In: DTIC (2012)
|
|
BASE
|
|
Show details
|
|
39 |
Use of Modality and Negation in Semantically-Informed Syntactic MT
|
|
|
|
BASE
|
|
Show details
|
|
40 |
Incremental Syntactic Language Models for Phrase-Based Translation
|
|
|
|
In: DTIC (2011)
|
|
BASE
|
|
Show details
|
|
|
|