DE eng

Search in the Catalogues and Directories

Hits 1 – 7 of 7

1
Bilingual lexicon induction across orthographically-distinct under-resourced Dravidian languages
In: Chakravarthi, Bharathi Raja orcid:0000-0002-4575-7934 , Rajasekaran, Navaneethan, Arcan, Mihael orcid:0000-0002-3116-621X , McGuinness, Kevin orcid:0000-0003-1336-6477 , O'Connor, Noel E. orcid:0000-0002-4033-9135 and McCrae, John P. orcid:0000-0002-7227-1331 (2020) Bilingual lexicon induction across orthographically-distinct under-resourced Dravidian languages. In: 7th Workshop on NLP for Similar Languages, Varieties and Dialects, 13 Dec 2020, Barcelona, Spain (Online). (2020)
Abstract: Bilingual lexicons are a vital tool for under-resourced languages and recent state-of-the-art approaches to this leverage pretrained monolingual word embeddings using supervised or semi- supervised approaches. However, these approaches require cross-lingual information such as seed dictionaries to train the model and find a linear transformation between the word embedding spaces. Especially in the case of low-resourced languages, seed dictionaries are not readily available, and as such, these methods produce extremely weak results on these languages. In this work, we focus on the Dravidian languages, namely Tamil, Telugu, Kannada, and Malayalam, which are even more challenging as they are written in unique scripts. To take advantage of orthographic information and cognates in these languages, we bring the related languages into a single script. Previous approaches have used linguistically sub-optimal measures such as the Levenshtein edit distance to detect cognates, whereby we demonstrate that the longest common sub-sequence is linguistically more sound and improves the performance of bilingual lexicon induction. We show that our approach can increase the accuracy of bilingual lexicon induction methods on these languages many times, making bilingual lexicon induction approaches feasible for such under-resourced languages.
Keyword: Computational linguistics; Information retrieval; Machine translating
URL: http://doras.dcu.ie/25223/
BASE
Hide details
2
Aspects of Terminological and Named Entity Knowledge within Rule-Based Machine Translation Models for Under-Resourced Neural Machine Translation Scenarios ...
BASE
Show details
3
Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages
Chakravarthi, Bharathi Raja; Arcan, Mihael; McCrae, John P.. - : Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik, 2019. : OASIcs - OpenAccess Series in Informatics. 2nd Conference on Language, Data and Knowledge (LDK 2019), 2019
BASE
Show details
4
Improving wordnets for under-resourced languages using machine translation
Chakravarthi, Bharathi Raja; Arcan, Mihael; McCrae, John P.. - : Global Wordnet Association, 2019
BASE
Show details
5
WordNet gloss translation for under-resourced languages using multilingual neural machine translation
McCrae, John P.; Arcan, Mihael; Chakravarthi, Bharathi Raja. - : European Association for Machine Translation, 2019
BASE
Show details
6
Multilingual multimodal machine translation for Dravidian languages utilizing phonetic transcription
Arcan, Mihael; Chakravarthi, Bharathi Raja; Priyadharshini, Ruba. - : European Association for Machine Translation, 2019
BASE
Show details
7
Comparison of Different Orthographies for Machine Translation of Under-Resourced Dravidian Languages ...
Chakravarthi, Bharathi Raja; Arcan, Mihael; McCrae, John P.. - : Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik GmbH, Wadern/Saarbruecken, Germany, 2019
BASE
Show details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
7
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern