1 |
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0
|
|
|
|
In: Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022) ; https://hal.inria.fr/hal-03639144 ; Proceedings of the International Workshop on Challenges & Perspectives in Creating Large Language Models 2022 (BigScience 2022), May 2022, Dublin, France (2022)
|
|
Abstract:
International audience ; In this work, we explore whether the recently demonstrated zero-shot abilities of the T0 model extend to Named Entity Recognition for out-of-distribution languages and time periods. Using a historical newspaper corpus in 3 languages as test-bed, we use prompts to extract possible named entities. Our results show that a naive approach for prompt-based zero-shot multilingual Named Entity Recognition is error-prone, but highlights the potential of such an approach for historical languages lacking labeled datasets. Moreover, we also find that T0-like models can be probed to predict the publication date and language of a document, which could be very relevant for the study of historical texts.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-DB]Computer Science [cs]/Databases [cs.DB]; [SHS.LANGUE]Humanities and Social Sciences/Linguistics
|
|
URL: https://hal.inria.fr/hal-03639144
|
|
BASE
|
|
Hide details
|
|
2 |
Entities, Dates, and Languages: Zero-Shot on Historical Texts with T0 ...
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Quantifying Contextual Aspects of Inter-annotator Agreement in Intertextuality Research
|
|
|
|
In: Proceedings of the 5th Joint SIGHUM Workshop on Computational Linguistics for Cultural Heritage, Social Sciences, Humanities and Literature ; LaTeCH-CLfL 2021 ; https://halshs.archives-ouvertes.fr/halshs-03636967 ; LaTeCH-CLfL 2021, 2021, Punta Cana, Dominican Republic. ⟨10.18653/v1/2021.latechclfl-1.4⟩ ; https://aclanthology.org/2021.latechclfl-1.4 (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Quantifying Contextual Aspects of Inter-annotator Agreement in Intertextuality Research ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
More Data and New Tools. Advances in Parsing the Index Thomisticus Treebank
|
|
|
|
BASE
|
|
Show details
|
|
6 |
Supplementary Materials of "Classifying Evolutionary Forces in Language Change Using Neural Networks" ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Supplementary Materials of "Classifying Evolutionary Forces in Language Change Using Neural Networks" ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Overview of PAN 2020: Authorship Verification, Celebrity Profiling, Profiling Fake News Spreaders on Twitter, and Style Change Detection
|
|
|
|
BASE
|
|
Show details
|
|
9 |
On the Feasibility of Automated Detection of Allusive Text Reuse ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Overview of PAN 2019: Bots and Gender Profiling, Celebrity Profiling, Cross-domain Authorship Attribution and Style Change Detection
|
|
|
|
BASE
|
|
Show details
|
|
|
|