DE eng

Search in the Catalogues and Directories

Hits 1 – 3 of 3

1
Induction of latent domains in heterogeneous corpora: a case study of word alignment [<Journal>]
Cuong, Hoang [Verfasser]; Sima’an, Khalil [Sonstige]
DNB Subject Category Language
Show details
2
IOS Press An Efficient Framework for Extracting Parallel Sentences from
In: http://www.jaist.ac.jp/~bao/papers/FIdraft1.pdf
Abstract: Abstract. Automatically building a large bilingual corpus that contains millions of words is always a challenging task. In particular in case of low-resource languages, it is difficult to find an existing parallel corpus which is large enough for building a real statistical machine translation. However, comparable non-parallel corpora are richly available in the Internet environment, such as in Wikipedia, and from which we can extract valuable parallel texts. This work presents a framework for effectively extracting parallel sentences from that resource, which results in significantly improving the performance of statistical machine translation systems. Our framework is a bootstrappingbased method that is strengthened by using a new measurement for estimating the similarity between two bilingual sentences. We conduct experiment for the language pair of English and Vietnamese and obtain promising results on both constructing parallel corpora and improving the accuracy of machine translation from English to Vietnamese. Parallel sentence extraction; non-parallel comparable corpora; statistical machine trans-Keywords: lation.
URL: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.431.8374
http://www.jaist.ac.jp/~bao/papers/FIdraft1.pdf
BASE
Hide details
3
IOS Press An Efficient Framework for Extracting Parallel Sentences
In: http://www.jaist.ac.jp/~bao/papers/FIdraft.pdf
BASE
Show details

Catalogues
0
0
0
0
1
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
2
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern