6 |
Towards Instance-Level Parser Selection for Cross-Lingual Transfer of Dependency Parsers
|
|
Glavas, Goran; Agic, Zeljko; Vulic, Ivan; Litschko, Robert. - : International Committee on Computational Linguistics, 2020. : https://www.aclweb.org/anthology/2020.coling-main.345, 2020. : Proceedings of the 28th International Conference on Computational Linguistics (COLING 2020), 2020
|
|
Abstract:
Current methods of cross-lingual parser transfer focus on predicting the best parser for a low-resource target language globally, that is, "at treebank level”. In this work, we propose and argue for a novel cross-lingual transfer paradigm: instance-level parser selection (ILPS), and present a proof-of-concept study focused on instance-level selection in the framework of delexicalized parser transfer. Our work is motivated by an empirical observation that different source parsers are the best choice for different Universal POS-sequences (i.e., UPOS sentences) in the target language. We then propose to predict the best parser at the instance level. To this end, we train a supervised regression model, based on the Transformer architecture, to predict parser accuracies for individual POS-sequences. We compare ILPS against two strong single-best parser selection baselines (SBPS): (1) a model that compares POS n-gram distributions between the source and target languages (KL) and (2) a model that selects the source based on the similarity between manually created language vectors encoding syntactic properties of languages (L2V). The results from our extensive evaluation, coupling 42 source parsers and 20 diverse low-resource test languages, show that ILPS outperforms KL and L2V on 13/20 and 14/20 test languages, respectively. Further, we show that by predicting the best parser “at treebank level” (SBPS), using the aggregation of predictions from our instance-level model, we outperform the same baselines on 17/20 and 16/20 test languages.
|
|
URL: https://doi.org/10.17863/CAM.62214 https://www.repository.cam.ac.uk/handle/1810/315107
|
|
BASE
|
|
Hide details
|
|
12 |
hr500k – A Reference Training Corpus of Croatian.
|
|
|
|
In: Conference papers (2018)
|
|
BASE
|
|
Show details
|
|
13 |
Parsing Universal Dependencies without training
|
|
|
|
In: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, ; EACL 2017 - 15th Conference of the European Chapter of the Association for Computational Linguistics ; https://hal.inria.fr/hal-01677405 ; EACL 2017 - 15th Conference of the European Chapter of the Association for Computational Linguistics, Apr 2017, Valencia, Spain. pp.229 - 239 ; http://eacl2017.org/ (2017)
|
|
BASE
|
|
Show details
|
|
16 |
Universal Dependencies 2.0 – CoNLL 2017 Shared Task Development and Test Data
|
|
|
|
BASE
|
|
Show details
|
|
18 |
Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages
|
|
|
|
In: Samardžić, Tanja; Starović, Mirjana; Agić, Željko; Ljubešić, Nikola (2017). Universal Dependencies for Serbian in Comparison with Croatian and Other Slavic Languages. In: Proceedings of the 6th Workshop on Balto-Slavic Natural Language Processing, Valencia, Spain, 4 April 2017, Association for Computational Linguistic. (2017)
|
|
BASE
|
|
Show details
|
|
20 |
Multilingual Projection for Parsing Truly Low-Resource Languageš
|
|
|
|
In: EISSN: 2307-387X ; Transactions of the Association for Computational Linguistics ; https://hal.inria.fr/hal-01426754 ; Transactions of the Association for Computational Linguistics, The MIT Press, 2016 (2016)
|
|
BASE
|
|
Show details
|
|
|
|