DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...9
Hits 1 – 20 of 177

1
Between words and characters: A Brief History of Open-Vocabulary Modeling and Tokenization in NLP
In: https://hal.inria.fr/hal-03540069 ; 2022 (2022)
BASE
Show details
2
RETRIEVING SPEAKER INFORMATION FROM PERSONALIZED ACOUSTIC MODELS FOR SPEECH RECOGNITION
In: IEEE ICASSP 2022 ; https://hal.archives-ouvertes.fr/hal-03539741 ; IEEE ICASSP 2022, 2022, Singapour, Singapore (2022)
BASE
Show details
3
Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources
In: https://hal.inria.fr/hal-03550289 ; 2022 (2022)
BASE
Show details
4
Automatic Normalisation of Early Modern French
In: https://hal.inria.fr/hal-03540226 ; 2022 (2022)
Abstract: Spelling normalisation is a useful step in the study and analysis of historical language texts, whether it is manual analysis by experts or automatic analysis using downstream natural language processing (NLP) tools. Not only does it help to homogenise the variable spelling that often exists in historical texts, but it also facilitates the use of off-the-shelf contemporary NLP tools, if contemporary spelling conventions are used for normalisation. We present FreEMnorm, a new benchmark for the normalisation of Early Modern French (from the 17th century) into contemporary French and provide a thorough comparison of three different normalisation methods: ABA, an alignment-based approach and MT-approaches, (both statistical and neural), including extensive parameter searching, which is often missing in the normalisation literature.
Keyword: [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; Digital Humanities; Historical; Machine Translation; Modern French; Normalisation; Spelling
URL: https://hal.inria.fr/hal-03540226/document
https://doi.org/10.5281/zenodo.5865428
https://hal.inria.fr/hal-03540226
https://hal.inria.fr/hal-03540226/file/LREC_2022_ModFr_Normalisation-18.pdf
BASE
Hide details
5
From FreEM to D'AlemBERT ; From FreEM to D'AlemBERT: a Large Corpus and a Language Model for Early Modern French
In: Proceedings of the 13th Language Resources and Evaluation Conference ; https://hal.inria.fr/hal-03596653 ; Proceedings of the 13th Language Resources and Evaluation Conference, European Language Resources Association, Jun 2022, Marseille, France (2022)
BASE
Show details
6
Caveats of Measuring Semantic Change of Cognates and Borrowings using Multilingual Word Embeddings
In: LChange'22 - 3rd International Workshop on Computational Approaches to Historical Language Change 2022 ; https://hal.inria.fr/hal-03635005 ; LChange'22 - 3rd International Workshop on Computational Approaches to Historical Language Change 2022, May 2022, Dublin, Ireland (2022)
BASE
Show details
7
Towards a Cleaner Document-Oriented Multilingual Crawled Corpus
In: https://hal.inria.fr/hal-03536361 ; 2022 (2022)
BASE
Show details
8
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
In: Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022) ; https://hal.inria.fr/hal-03639144 ; Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022), May 2022, Dublin, France (2022)
BASE
Show details
9
Probing Multilingual Cognate Prediction Models
In: Findings of the Association for Computational Linguistics: ACL 2022 ; https://hal.inria.fr/hal-03614691 ; Findings of the Association for Computational Linguistics: ACL 2022, May 2022, Dublin, Ireland (2022)
BASE
Show details
10
Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
In: Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03527328 ; Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021), Jan 2022, punta cana, Dominican Republic ; https://aclanthology.org/2021.wnut-1.47/ (2022)
BASE
Show details
11
Rethinking Automatic Evaluation in Sentence Simplification
In: https://hal.inria.fr/hal-03199901 ; 2021 (2021)
BASE
Show details
12
Multilingual Unsupervised Sentence Simplification
In: https://hal.inria.fr/hal-03109299 ; 2021 (2021)
BASE
Show details
13
A dataset for automatic detection of places in (early) modern French texts ; Un jeu de données pour la détection automatique de lieux dans les textes français modernes
In: NASSCFL 2021 - 50th Annual North American Society for Seventeenth-Century French Literature Conference ; https://hal.archives-ouvertes.fr/hal-03187097 ; NASSCFL 2021 - 50th Annual North American Society for Seventeenth-Century French Literature Conference, NASSCFL, May 2021, Iowa City / Virtual, United States. pp.5 (2021)
BASE
Show details
14
MOR Digital: The Advent of a New Lexicographical Portuguese Project
In: eLex 2021 - Seventh biennial conference on electronic lexicography ; https://hal.inria.fr/hal-03195362 ; eLex 2021 - Seventh biennial conference on electronic lexicography, Jul 2021, Brno, Czech Republic ; https://elex.link/elex2021/ (2021)
BASE
Show details
15
Transport Optimal pour le Changement Sémantique à partir de Plongements Contextualisés
In: Actes de la 28e Conférence sur le Traitement Automatique des Langues Naturelles. Volume 1 : conférence principale ; TALN 2021 - Traitement Automatique des Langues Naturelles ; https://hal.archives-ouvertes.fr/hal-03265889 ; TALN 2021 - Traitement Automatique des Langues Naturelles, Jun 2021, Lille / Virtuel, France. pp.235-244 (2021)
BASE
Show details
16
MORDigital: The Advent of a New Lexicographical Portuguese Project
In: eLex 2021 - Seventh biennial conference on electronic lexicography ; https://hal.inria.fr/hal-03195362 ; eLex 2021 - Seventh biennial conference on electronic lexicography, Jul 2021, Brno, Czech Republic ; https://elex.link/elex2021/ (2021)
BASE
Show details
17
Ungoliant: An Optimized Pipeline for the Generation of a Very Large-Scale Multilingual Web Corpus
In: CMLC 2021 - 9th Workshop on Challenges in the Management of Large Corpora ; https://hal.inria.fr/hal-03301590 ; CMLC 2021 - 9th Workshop on Challenges in the Management of Large Corpora, Jul 2021, Limerick / Virtual, Ireland. ⟨10.14618/ids-pub-10468⟩ ; https://www.cl2021.org/ (2021)
BASE
Show details
18
The Zero Resource Speech Challenge 2021: Spoken language modelling
In: ISSN: 0162-8828 ; IEEE Transactions on Pattern Analysis and Machine Intelligence ; https://hal.inria.fr/hal-03329301 ; IEEE Transactions on Pattern Analysis and Machine Intelligence, Institute of Electrical and Electronics Engineers, 2021, pp.1-1. ⟨10.1109/TPAMI.2021.3083839⟩ (2021)
BASE
Show details
19
The Zero Resource Speech Challenge 2021: Spoken language modelling
In: Interspeech 2021 - Conference of the International Speech Communication Association ; https://hal.inria.fr/hal-03329301 ; Interspeech 2021 - Conference of the International Speech Communication Association, Aug 2021, Brno, Czech Republic. ⟨10.1109/TPAMI.2021.3083839⟩ (2021)
BASE
Show details
20
Early phonetic learning without phonetic categories -- Insights from large-scale simulations on realistic input
In: ISSN: 0027-8424 ; EISSN: 1091-6490 ; Proceedings of the National Academy of Sciences of the United States of America ; https://hal.archives-ouvertes.fr/hal-03070566 ; Proceedings of the National Academy of Sciences of the United States of America , National Academy of Sciences, 2021, 118 (7), pp.e2001844118. ⟨10.1073/pnas.2001844118⟩ (2021)
BASE
Show details

Page: 1 2 3 4 5...9

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
177
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern