1 |
Towards Arabic Sentence Simplification via Classification and Generative Approaches ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
Overview of the Fourth BUCC Shared Task: Bilingual Dictionary Induction from Comparable Corpora
|
|
|
|
In: 13th Workshop on Building and Using Comparable Corpora (BUCC) ; https://hal.archives-ouvertes.fr/hal-03100822 ; 13th Workshop on Building and Using Comparable Corpora (BUCC), May 2020, Marseille, France. pp.6-13 (2020)
|
|
BASE
|
|
Show details
|
|
3 |
Know thy corpus! Robust methods for digital curation of Web corpora ...
|
|
|
|
Abstract:
This paper proposes a novel framework for digital curation of Web corpora in order to provide robust estimation of their parameters, such as their composition and the lexicon. In recent years language models pre-trained on large corpora emerged as clear winners in numerous NLP tasks, but no proper analysis of the corpora which led to their success has been conducted. The paper presents a procedure for robust frequency estimation, which helps in establishing the core lexicon for a given corpus, as well as a procedure for estimating the corpus composition via unsupervised topic models and via supervised genre classification of Web pages. The results of the digital curation study applied to several Web-derived corpora demonstrate their considerable differences. First, this concerns different frequency bursts which impact the core lexicon obtained from each corpus. Second, this concerns the kinds of texts they contain. For example, OpenWebText contains considerably more topical news and political argumentation ...
|
|
Keyword:
Computation and Language cs.CL; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2003.06389 https://arxiv.org/abs/2003.06389
|
|
BASE
|
|
Hide details
|
|
4 |
Recognizing semantic relations by combining transformers and fully connected models
|
|
|
|
BASE
|
|
Show details
|
|
5 |
A Multilingual Dataset for Evaluating Parallel Sentence Extraction from Comparable Corpora
|
|
|
|
In: International Conference on Language Resources and Evaluation ; https://hal.archives-ouvertes.fr/hal-01898362 ; International Conference on Language Resources and Evaluation, May 2018, Miyazaki, Japan (2018)
|
|
BASE
|
|
Show details
|
|
8 |
Language Adaptation for Extending Post-Editing Estimates for Closely Related Languages
|
|
|
|
In: Prague Bulletin of Mathematical Linguistics , Vol 106, Iss 1, Pp 181-192 (2016) (2016)
|
|
BASE
|
|
Show details
|
|
17 |
Terminology Extraction, Translation Tools and Comparable Corpora: TTC concept, midterm progress and achieved results
|
|
|
|
In: LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS) ; https://hal.archives-ouvertes.fr/hal-00819909 ; LREC 2012 Workshop on Creating Cross-language Resources for Disconnected Languages and Styles (CREDISLAS), May 2012, Istanbul, Turkey. 4 p (2012)
|
|
BASE
|
|
Show details
|
|
19 |
User-centred Views on Terminology Extraction Tools: Usage Scenarios and Integration into MT and CAT Tools.
|
|
|
|
In: Actes du colloque Tralogy : Anticiper les technologies pour la traduction ; Tralogy I. Métiers et technologies de la traduction : quelles convergences pour l'avenir ? ; https://hal.archives-ouvertes.fr/hal-00818657 ; Tralogy I. Métiers et technologies de la traduction : quelles convergences pour l'avenir ?, Mar 2011, Paris, France. 10 p (2011)
|
|
BASE
|
|
Show details
|
|
|
|