21 |
Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques
|
|
|
|
In: Conférence en Recherche d'Informations et Applications (CORIA 2021) ; https://hal.archives-ouvertes.fr/hal-03320332 ; Conférence en Recherche d'Informations et Applications (CORIA 2021), ARIA : Association Francophone de Recherche d’Information (RI) et Applications, Apr 2021, Grenoble (virtuel), France. pp.1 - 7 ; http://coria.asso-aria.org/2021/articles/mini_24/main.pdf (2021)
|
|
BASE
|
|
Show details
|
|
23 |
Multilingual Epidemic Event Extraction
|
|
|
|
In: Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings ; https://hal.archives-ouvertes.fr/hal-03480551 ; Hao-Ren Ke; Chei Sian Lee; Kazunari Sugiyama. Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings, 13133, Springer, pp.139-156, 2021, Lecture Notes in Computer Science, 978-3-030-91668-8. ⟨10.1007/978-3-030-91669-5_12⟩ (2021)
|
|
Abstract:
International audience ; In this paper, we focus on epidemic event extraction in multilingual and low-resource settings. The task of extracting epidemic events is defined as the detection of disease names and locations in a document. We experiment with a multilingual dataset comprising news articles from the medical domain with diverse morphological structures (Chinese, English, French, Greek, Polish, and Russian). We investigate various Transformer-based models, also adopting a two-stage strategy, first finding the documents that contain events and then performing event extraction. Our results show that error propagation to the downstream task was higher than expected. We also perform an in-depth analysis of the results, concluding that different entity characteristics can influence the performance. Moreover, we perform several preliminary experiments for the low-resourced languages present in the dataset using the mean teacher semi-supervised technique. Our findings show the potential of pre-trained language models benefiting from the incorporation of unannotated data in the training process.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Epidemiological surveillance; Multilingualism; Semi-supervised learning
|
|
URL: https://hal.archives-ouvertes.fr/hal-03480551 https://hal.archives-ouvertes.fr/hal-03480551/document https://hal.archives-ouvertes.fr/hal-03480551/file/paper_46.pdf https://doi.org/10.1007/978-3-030-91669-5_12
|
|
BASE
|
|
Hide details
|
|
24 |
Étude comparative de méthodes de classification multilingue appliquées à l'épidémiologie
|
|
|
|
In: COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference ; https://hal.archives-ouvertes.fr/hal-03320343 ; COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference, Apr 2021, Grenoble (virtuel), France (2021)
|
|
BASE
|
|
Show details
|
|
25 |
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
|
|
|
|
In: SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ; https://hal.archives-ouvertes.fr/hal-03418387 ; SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩ (2021)
|
|
BASE
|
|
Show details
|
|
26 |
Modelling repeated paired phonetic measures using linear mixed models with correlated errors
|
|
|
|
In: ISSN: 2152-372X ; Case Studies in Business, Industry and Government Statistics ; https://hal.archives-ouvertes.fr/hal-03235741 ; Case Studies in Business, Industry and Government Statistics, Société Française de Statistique, 2021, 8, pp.28-46 ; http://csbigs.fr/article/view/811 (2021)
|
|
BASE
|
|
Show details
|
|
27 |
From disparate disciplines to unity in diversity How the PARTHENOS project has brought European humanities Research Infrastructures together
|
|
|
|
In: ISSN: 1753-8548 ; EISSN: 1755-1706 ; International Journal of Humanities and Arts Computing ; https://hal.inria.fr/hal-03402145 ; International Journal of Humanities and Arts Computing, Edinburgh University Press, 2021, 15 (1-2), pp.101-116. ⟨10.3366/ijhac.2021.0264⟩ (2021)
|
|
BASE
|
|
Show details
|
|
28 |
Topic modelling on archive documents from the 1970s: global policies on refugees
|
|
|
|
In: ISSN: 2055-7671 ; EISSN: 2055-768X ; Digital Scholarship in the Humanities ; https://hal.archives-ouvertes.fr/hal-03435806 ; Digital Scholarship in the Humanities, Oxford University Press, 2021, 36 (4), pp.886-904. ⟨10.1093/llc/fqab018⟩ (2021)
|
|
BASE
|
|
Show details
|
|
29 |
Identification et gestion des données personnelles dans les textes ; Identification et gestion des données personnelles dans les textes: modèle sémantique et applications
|
|
|
|
In: CiDE.22 : 22éme édition du Colloque International sur le Document Electronique Données Documents Connaissances : Perspectives de recherche et d’enseignement ; https://hal.archives-ouvertes.fr/hal-03506075 ; CiDE.22 : 22éme édition du Colloque International sur le Document Electronique Données Documents Connaissances : Perspectives de recherche et d’enseignement, Dec 2021, Paris, France (2021)
|
|
BASE
|
|
Show details
|
|
30 |
Data Papers et dissémination des données de la recherche : quelles pratiques en SHS ?
|
|
|
|
In: Colloque DHNord2021 : Publier, partager, réutiliser les données de la recherche : les data papers et leurs enjeux ; https://hal.archives-ouvertes.fr/hal-03506077 ; Colloque DHNord2021 : Publier, partager, réutiliser les données de la recherche : les data papers et leurs enjeux, Nov 2021, virtuelle, France (2021)
|
|
BASE
|
|
Show details
|
|
31 |
Latin vs. Russian: the Languages of Rhetoric Classes in 18th Century Russian Seminaries ; Латынь vs русский: языки класса риторики в русских семинариях XVIII века
|
|
|
|
In: Slověne = Словѣне. International Journal of Slavic Studies; Vol 10, No 2 (2021); 338-352 ; 2305-6754 ; 2304-0785 (2021)
|
|
BASE
|
|
Show details
|
|
33 |
MELHISSA: a multilingual entity linking architecture for historical press articles ...
|
|
|
|
BASE
|
|
Show details
|
|
34 |
Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana (report) ...
|
|
|
|
BASE
|
|
Show details
|
|
35 |
MELHISSA: a multilingual entity linking architecture for historical press articles ...
|
|
|
|
BASE
|
|
Show details
|
|
36 |
Risorse in rete sull'analisi dei testi e la ricerca qualitativa (Online resources on text analysis and qualitative research). ...
|
|
|
|
BASE
|
|
Show details
|
|
37 |
Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana (poster) ...
|
|
|
|
BASE
|
|
Show details
|
|
39 |
Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana (dataset) ...
|
|
|
|
BASE
|
|
Show details
|
|
40 |
Automatic translation and multilingual cultural heritage retrieval: a case study with transcriptions in Europeana (poster) ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|