1 |
Establishing a New State-of-the-Art for French Named Entity Recognition
|
|
|
|
In: LREC 2020 - 12th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-02617950 ; LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France ; http://www.lrec-conf.org (2020)
|
|
BASE
|
|
Show details
|
|
2 |
OFrLex: A Computational Morphological and Syntactic Lexicon for Old French
|
|
|
|
In: LREC 2020 - 12th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-02677957 ; LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France. 3217-3225 (updated version) (2020)
|
|
BASE
|
|
Show details
|
|
3 |
Controllable Sentence Simplification
|
|
|
|
In: LREC 2020 - 12th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-02678214 ; LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France ; http://www.lrec-conf.org/proceedings/lrec2020/index.html (2020)
|
|
BASE
|
|
Show details
|
|
4 |
Building a User-Generated Content North-African Arabizi Treebank: Tackling Hell
|
|
|
|
In: ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics ; https://hal.inria.fr/hal-02889804 ; ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020, Seattle / Virtual, Canada. ⟨10.18653/v1/2020.acl-main.107⟩ (2020)
|
|
BASE
|
|
Show details
|
|
5 |
CamemBERT: a Tasty French Language Model
|
|
|
|
In: ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics ; https://hal.inria.fr/hal-02889805 ; ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020, Seattle / Virtual, United States. ⟨10.18653/v1/2020.acl-main.645⟩ (2020)
|
|
BASE
|
|
Show details
|
|
6 |
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages
|
|
|
|
In: ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics ; https://hal.inria.fr/hal-02863875 ; ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020, Seattle / Virtual, United States. ⟨10.18653/v1/2020.acl-main.156⟩ ; https://acl2020.org (2020)
|
|
BASE
|
|
Show details
|
|
7 |
Methodological Aspects of Developing and Managing an Etymological Lexical Resource: Introducing EtymDB 2.0
|
|
|
|
In: LREC 2020 - 12th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-02678100 ; LREC 2020 - 12th Language Resources and Evaluation Conference, May 2020, Marseille, France (2020)
|
|
BASE
|
|
Show details
|
|
8 |
French Contextualized Word-Embeddings with a sip of CaBeRnet: a New French Balanced Reference Corpus
|
|
|
|
In: CMLC-8 - 8th Workshop on the Challenges in the Management of Large Corpora ; https://hal.inria.fr/hal-02678358 ; CMLC-8 - 8th Workshop on the Challenges in the Management of Large Corpora, May 2020, Marseille, France ; https://lrec2020.lrec-conf.org/media/proceedings/Workshops/Books/CMLC-8book.pdf (2020)
|
|
BASE
|
|
Show details
|
|
9 |
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations
|
|
|
|
In: ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics ; https://hal.inria.fr/hal-02889823 ; ACL 2020 - 58th Annual Meeting of the Association for Computational Linguistics, Jul 2020, Seattle / Virtual, United States (2020)
|
|
BASE
|
|
Show details
|
|
10 |
Evaluating the reliability of acoustic speech embeddings
|
|
|
|
In: INTERSPEECH 2020 - Annual Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-02977539 ; INTERSPEECH 2020 - Annual Conference of the International Speech Communication Association, Oct 2020, Shanghai / Vitrtual, China (2020)
|
|
BASE
|
|
Show details
|
|
11 |
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
|
|
|
|
In: https://hal.inria.fr/hal-03109106 ; 2020 (2020)
|
|
Abstract:
Transfer learning based on pretraining language models on a large amount of raw data has become a new norm to reach state-of-the-art performance in NLP. Still, it remains unclear how this approach should be applied for unseen languages that are not covered by any available large-scale multilingual language model and for which only a small amount of raw data is generally available. In this work, by comparing multilingual and monolingual models, we show that such models behave in multiple ways on unseen languages. Some languages greatly benefit from transfer learning and behave similarly to closely related high resource languages whereas others apparently do not. Focusing on the latter, we show that this failure to transfer is largely related to the impact of the script used to write such languages. Transliterating those languages improves very significantly the ability of large-scale multilingual language models on downstream tasks.
|
|
Keyword:
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
|
|
URL: https://hal.inria.fr/hal-03109106
|
|
BASE
|
|
Hide details
|
|
12 |
Comparing Statistical and Neural Models for Learning Sound Correspondences
|
|
|
|
In: LT4HALA 2020 : First Workshop on Language Technologies for Historical and Ancient Languages ; https://hal.inria.fr/hal-02529929 ; LT4HALA 2020 : First Workshop on Language Technologies for Historical and Ancient Languages, May 2020, Marseille, France (2020)
|
|
BASE
|
|
Show details
|
|
15 |
Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages ...
|
|
|
|
BASE
|
|
Show details
|
|
18 |
When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
MUSS: Multilingual Unsupervised Sentence Simplification by Mining Paraphrases ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
ASSET: A Dataset for Tuning and Evaluation of Sentence Simplification Models with Multiple Rewriting Transformations ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|