Home Catalogue search

eng

Refine your search:
- Keyword
- Creator / Publisher
- Year:
  - 2022 (7)
  - 2021 (21)
  - 2020 (21)
  - 2019 (15)
  - 2018 (10)
  - 2017 (17)
  - 2016 (2)
  - 2015 (2)
  - 2014 (8)
  - 2013 (2)
  - more
- Medium
- Type:
- BLLDB-Access

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5...9

Hits 1 – 20 of 165

1	Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
	Mielke, Sabrina J.; Alyafeai, Zaid; Salesky, Elizabeth...
	In: https://hal.inria.fr/hal-03540069 ; 2022 (2022)
	BASE
	Show details

2	Automatic Normalisation of Early Modern French
	Bawden, Rachel; Poinhos, Jonathan; Kogkitsidou, Eleni...
	In: https://hal.inria.fr/hal-03540226 ; 2022 (2022)
	BASE
	Show details

3	From FreEM to D'AlemBERT ; From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French
	Gabay, Simon; Ortiz Suarez, Pedro; Bartz, Alexandre...
	In: Proceedings of the 13th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-03596653 ; Proceedings of the 13th Language Resources and Evaluation Conference, European Language Resources Association, Jun 2022, Marseille, France (2022)
	BASE
	Show details

4	Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
	Abadji, Julien; Ortiz Suarez, Pedro; Romary, Laurent...
	In: https://hal.inria.fr/hal-03536361 ; 2022 (2022)
	BASE
	Show details

5	Probing Multilingual Cognate Prediction Models
	Fourrier, Clémentine; Sagot, Benoît
	In: Findings of the Association for Computational Linguistics: ACL 2022 ; https://hal.inria.fr/hal-03614691 ; Findings of the Association for Computational Linguistics: ACL 2022, May 2022, Dublin, Ireland (2022)
	BASE
	Show details

6	Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
	Riabi, Arij; Sagot, Benoît; Seddah, Djamé
	In: Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03527328 ; Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021), Jan 2022, punta cana, Dominican Republic ; https://aclanthology.org/2021.wnut-1.47/ (2022)
	Abstract: International audience ; Recent impressive improvements in NLP, largely based on the success of contextual neural language models, have been mostly demonstrated on at most a couple dozen high-resource languages. Building language models and, more generally, NLP systems for non-standardized and low-resource languages remains a challenging task. In this work, we focus on North-African colloquial dialectal Arabic written using an extension of the Latin script, called NArabizi, found mostly on social media and messaging communication. In this low-resource scenario with data displaying a high level of variability, we compare the downstream performance of a character-based language model on part-of-speech tagging and dependency parsing to that of monolingual and multilingual models. We show that a character-based model trained on only 99k sentences of NArabizi and fined-tuned on a small treebank of this language leads to performance close to those obtained with the same architecture pre-trained on large multilingual and monolingual models. Confirming these results a on much larger data set of noisy French user-generated content, we argue that such character-based language models can be an asset for NLP in low-resource and high language variability set-tings.
	Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-SI]Computer Science [cs]/Social and Information Networks [cs.SI]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
	URL: https://hal.inria.fr/hal-03527328
	BASE
	Hide details

7	Towards a Cleaner Document-Oriented Multilingual Crawled Corpus ...
	Abadji, Julien; Suarez, Pedro Ortiz; Romary, Laurent. - : arXiv, 2022
	BASE
	Show details

8	Rethinking Automatic Evaluation in Sentence Simplification
	Scialom, Thomas; Martin, Louis; Staiano, Jacopo...
	In: https://hal.inria.fr/hal-03199901 ; 2021 (2021)
	BASE
	Show details

9	Multilingual Unsupervised Sentence Simplification
	Martin, Louis; Fan, Angela; de la Clergerie, Éric...
	In: https://hal.inria.fr/hal-03109299 ; 2021 (2021)
	BASE
	Show details

10	Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus
	Abadji, Julien; Ortiz Suárez, Pedro Javier; Romary, Laurent...
	In: CMLC 2021 - 9th Workshop on Challenges in the Management of Large Corpora ; https://hal.inria.fr/hal-03301590 ; CMLC 2021 - 9th Workshop on Challenges in the Management of Large Corpora, Jul 2021, Limerick / Virtual, Ireland. ⟨10.14618/ids-pub-10468⟩ ; https://www.cl2021.org/ (2021)
	BASE
	Show details

11	First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT
	Muller, Benjamin; Elazar, Yanai; Sagot, Benoît...
	In: https://hal.inria.fr/hal-03161685 ; 2021 (2021)
	BASE
	Show details

12	Can Multilingual Language Models Transfer to an Unseen Dialect? A Case Study on North African Arabizi
	Muller, Benjamin; Sagot, Benoît; Seddah, Djamé
	In: https://hal.inria.fr/hal-03161677 ; 2021 (2021)
	BASE
	Show details

13	First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT
	Muller, Benjamin; Elazar, Yanai; Sagot, Benoît...
	In: EACL 2021 - The 16th Conference of the European Chapter of the Association for Computational Linguistics ; https://hal.inria.fr/hal-03239087 ; EACL 2021 - The 16th Conference of the European Chapter of the Association for Computational Linguistics, Apr 2021, Kyiv / Virtual, Ukraine ; https://2021.eacl.org/ (2021)
	BASE
	Show details

14	When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models
	Muller, Benjamin; Anastasopoulos, Antonios; Sagot, Benoît...
	In: NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies ; https://hal.inria.fr/hal-03251105 ; NAACL-HLT 2021 - 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Jun 2021, Mexico City, Mexico (2021)
	BASE
	Show details

15	Quality at a Glance: An Audit of Web-Crawled Multilingual Datasets
	Caswell, Isaac; Kreutzer, Julia; Wang, Lisa...
	In: https://hal.inria.fr/hal-03177623 ; 2021 (2021)
	BASE
	Show details

16	Can Cognate Prediction Be Modelled as a Low-Resource Machine Translation Task?
	Fourrier, Clémentine; Bawden, Rachel; Sagot, Benoît
	In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021 ; https://hal.inria.fr/hal-03243380 ; Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Aug 2021, Bangkok, Thailand (2021)
	BASE
	Show details

17	Synthetic Data Augmentation for Zero-Shot Cross-Lingual Question Answering
	Riabi, Arij; Scialom, Thomas; Keraron, Rachel...
	In: https://hal.inria.fr/hal-03109187 ; 2021 (2021)
	BASE
	Show details

18	Variation graphique dans les documents d'Ancien Régime : Nouvelles approches scriptométriques
	Gabay, Simon; Gambette, Philippe; Bawden, Rachel...
	In: Journée d’étude : « Pour une histoire de la langue ‘par en bas’: textes privés et variation des langues dans le passé » ; https://hal.inria.fr/hal-03357080 ; Journée d’étude : « Pour une histoire de la langue ‘par en bas’: textes privés et variation des langues dans le passé », Sep 2021, Paris, France (2021)
	BASE
	Show details

19	Expanding the content model of annotationBlock
	Bartz, Alexandre; Janes, Juliette; Romary, Laurent...
	In: Next Gen TEI, 2021 - TEI Conference and Members’ Meeting ; https://hal.archives-ouvertes.fr/hal-03380805 ; Next Gen TEI, 2021 - TEI Conference and Members’ Meeting, Oct 2021, Virtual, United States (2021)
	BASE
	Show details

20	Universal Dependencies 2.9
	Zeman, Daniel; Nivre, Joakim; Abrams, Mitchell. - : Universal Dependencies Consortium, 2021
	BASE
	Show details

Page: 1 2 3 4 5...9

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern