1 |
The language demographics of Amazon Mechanical Turk
|
|
|
|
In: https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/viewFile/262/34/ (2014)
|
|
BASE
|
|
Show details
|
|
2 |
The language demographics of Amazon Mechanical Turk. Transactions of the Association for Computational Linguistics
|
|
|
|
In: http://www.seas.upenn.edu/%7Eepavlick/papers/language_demographics_mturk.pdf (2014)
|
|
BASE
|
|
Show details
|
|
3 |
The language demographics of Amazon Mechanical Turk. Transactions of the Association for Computational Linguistics
|
|
|
|
In: http://www.cis.upenn.edu/~ccb/publications/language-demographics-of-mechanical-turk.pdf (2014)
|
|
BASE
|
|
Show details
|
|
4 |
PPDB: The paraphrase database
|
|
|
|
In: http://aclweb.org/anthology-new/N/N13/N13-1092.pdf (2013)
|
|
BASE
|
|
Show details
|
|
5 |
Dirt cheap web-scale parallel text from the common crawl
|
|
|
|
In: http://wing.comp.nus.edu.sg/~antho/P/P13/P13-1135.pdf (2013)
|
|
BASE
|
|
Show details
|
|
6 |
Dirt cheap web-scale parallel text from the common crawl
|
|
|
|
In: http://www.cs.jhu.edu/~alopez/papers/acl2013-smith+etal.pdf (2013)
|
|
Abstract:
Parallel text is the fuel that drives modern machine translation systems. The Web is a comprehensive source of preexisting parallel text, but crawling the entire web is impossible for all but the largest companies. We bring web-scale parallel text to the masses by mining the Common Crawl, a public Web crawl hosted on Amazon’s Elastic Cloud. Starting from nothing more than a set of common two-letter language codes, our open-source extension of the STRAND algorithm mined 32 terabytes of the crawl in just under a day, at a cost of about $500. Our large-scale experiment uncovers large amounts of parallel text in dozens of language pairs across a variety of domains and genres, some previously unavailable in curated datasets. Even with minimal cleaning and filtering, the resulting data boosts translation performance across the board for five different language pairs in the news domain, and on open domain test sets we see improvements of up to 5 BLEU. We make our code and data available for other researchers seeking to mine this rich new data resource. 1 1
|
|
URL: http://www.cs.jhu.edu/~alopez/papers/acl2013-smith+etal.pdf http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.348.4857
|
|
BASE
|
|
Hide details
|
|
7 |
Toward statistical machine translation without parallel corpora
|
|
|
|
In: http://www.cs.jhu.edu/~anni/papers/lowresmt/lowresmt.pdf (2012)
|
|
BASE
|
|
Show details
|
|
8 |
Toward statistical machine translation without parallel corpora
|
|
|
|
In: http://people.mmci.uni-saarland.de/~aklement/publications/eacl12mt.pdf (2012)
|
|
BASE
|
|
Show details
|
|
9 |
Constructing parallel corpora for six indian languages via crowdsourcing
|
|
|
|
In: http://www.aclweb.org/anthology/W12-3152/ (2012)
|
|
BASE
|
|
Show details
|
|
10 |
Arabic dialect identification
|
|
|
|
In: http://www.aclweb.org/anthology/J/J14/J14-1006.pdf (2012)
|
|
BASE
|
|
Show details
|
|
11 |
Machine translation of arabic dialects
|
|
|
|
In: http://www.aclweb.org/anthology-new/N/N12/N12-1006.pdf (2012)
|
|
BASE
|
|
Show details
|
|
12 |
Learning sentential paraphrases from bilingual parallel corpora for text-to-text generation
|
|
|
|
In: http://www.cs.jhu.edu/~ccb/publications/learning-sentential-paraprhases-from-bilingual-parallel-corpora.pdf (2011)
|
|
BASE
|
|
Show details
|
|
13 |
Incremental syntactic language models for phrase-based translation
|
|
|
|
In: http://wing.comp.nus.edu.sg/~antho/P/P11/P11-1063.pdf (2011)
|
|
BASE
|
|
Show details
|
|
14 |
Incremental syntactic language models for phrase-based translation
|
|
|
|
In: http://www.cis.upenn.edu/~ccb/publications/incremental-syntactic-language-models-for-phrase-based-translation.pdf (2011)
|
|
BASE
|
|
Show details
|
|
15 |
Crowdsourcing Translation: Professional Quality from Non-Professionals
|
|
|
|
In: http://cs.jhu.edu/%7Eozaidan/AOC/turk-trans_Zaidan-CCB_acl2011.pdf (2011)
|
|
BASE
|
|
Show details
|
|
16 |
Paraphrastic sentence compression with a character-based metric: Tightening without deletion
|
|
|
|
In: http://www-csli.stanford.edu/%7Eccb/publications/paraphrastic-sentence-compression.pdf (2011)
|
|
BASE
|
|
Show details
|
|
17 |
Paraphrastic sentence compression with a character-based metric: Tightening without deletion
|
|
|
|
In: http://aclweb.org/anthology-new/W/W11/W11-1610.pdf (2011)
|
|
BASE
|
|
Show details
|
|
18 |
at
|
|
|
|
In: http://www-csli.stanford.edu/~ccb/publications/hiero-grammar-extraction-with-suffix-arrays.pdf (2010)
|
|
BASE
|
|
Show details
|
|
19 |
Bilingual Lexicon Induction for Low-resource Languages
|
|
|
|
In: http://hltcoe.files.wordpress.com/2011/09/tr5bilinguallexicalinduction.pdf (2010)
|
|
BASE
|
|
Show details
|
|
20 |
Moving Beyond Phrase Pairs: The Relevance of the Corpus in a SMT World
|
|
|
|
In: http://www.cs.cmu.edu/%7Eaphillip/publications/proposal.pdf (2010)
|
|
BASE
|
|
Show details
|
|
|
|