3 |
TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages ...
|
|
|
|
Abstract:
This paper presents TaPaCo, a freely available paraphrase corpus for 73 languages extracted from the Tatoeba database. Tatoeba is a crowdsourcing project mainly geared towards language learners. Its aim is to provide example sentences and translations for particular linguistic constructions and words. The paraphrase corpus is created by populating a graph with Tatoeba sentences and equivalence links between sentences "meaning the same thing". This graph is then traversed to extract sets of paraphrases. Several language-independent filters and pruning steps are applied to remove uninteresting sentences. A manual evaluation performed on three languages shows that between half and three quarters of inferred paraphrases are correct and that most remaining ones are either correct but trivial, or near-paraphrases that neutralize a morphological distinction. The corpus contains a total of 1.9 million sentences, with 200 - 250 000 sentences per language. It covers a range of languages for which, to our knowledge, no ...
|
|
Keyword:
Multilingual corpus, Paraphrases, Crowdsourcing
|
|
URL: https://zenodo.org/record/3707949 https://dx.doi.org/10.5281/zenodo.3707949
|
|
BASE
|
|
Hide details
|
|
4 |
TaPaCo: A Corpus of Sentential Paraphrases for 73 Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
LSDC - A Comprehensive Dataset for Low Saxon Dialect Classification ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
An Evaluation Benchmark for Testing the Word Sense Disambiguation Capabilities of Machine Translation Systems ...
|
|
|
|
BASE
|
|
Show details
|
|
12 |
A quantitative approach to Swiss German - Dialectometric analyses and comparisons of linguistic levels
|
|
|
|
BASE
|
|
Show details
|
|
14 |
ArchiMob: Ein multidialektales Korpus schweizerdeutscher Spontansprache
|
|
|
|
In: Linguistik Online; Bd. 98 Nr. 5 (2019): Alemannische Dialektologie – Forschungsstand und Perspektiven. Sonderheft; 425-454 ; Linguistik Online; Vol. 98 No. 5 (2019): Alemannische Dialektologie – Forschungsstand und Perspektiven. Sonderheft; 425-454 ; 1615-3014 (2019)
|
|
BASE
|
|
Show details
|
|
15 |
Donnez votre Français à la Science ! Internet et la documentation de la diversité linguistique : présentation de la plateforme et premiers résultats
|
|
|
|
In: 6e Congrès Mondial de Linguistique Française ; https://hal.archives-ouvertes.fr/hal-02271315 ; 6e Congrès Mondial de Linguistique Française, Jul 2018, Mons, Belgium. ⟨10.1051/shsconf/20184602003⟩ (2018)
|
|
BASE
|
|
Show details
|
|
16 |
The WMT'18 Morpheval test suites for English-Czech, English-German, English-Finnish and Turkish-English
|
|
|
|
In: Proceedings of the Third Conference on Machine Translation ; 3rd Conference on Machine Translation (WMT 18) ; https://hal.archives-ouvertes.fr/hal-01910244 ; 3rd Conference on Machine Translation (WMT 18), Oct 2018, Bruxelles, Belgium. pp.550-564, ⟨10.18653/v1/W18-64060⟩ ; http://www.statmt.org/wmt18/ (2018)
|
|
BASE
|
|
Show details
|
|
17 |
Crowdsourcing Regional Variation Data and Automatic Geolocalisation of Speakers of European French
|
|
|
|
In: International Conference on Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-02271314 ; International Conference on Language Resources and Evaluation, European Language Resources Association (ELRA), May 2018, Miyazaki, Japan (2018)
|
|
BASE
|
|
Show details
|
|
18 |
Crowdsourcing regional variables and automatic geolocalisation of speakers of European French
|
|
|
|
In: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) ; https://hal.archives-ouvertes.fr/hal-02498762 ; Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018), 2018, Miyazaki, Japan ; https://www.aclweb.org/anthology/L18-1527/ (2018)
|
|
BASE
|
|
Show details
|
|
19 |
Measuring Semantic Abstraction of Multilingual NMT with Paraphrase Recognition and Generation Tasks ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|