1 |
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
|
|
|
|
In: Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II ; https://hal.archives-ouvertes.fr/hal-03635971 ; Matthias Hagen; Suzan Verberne; Craig Macdonald; Christin Seifert; Krisztian Balog; Kjetil Nørvåg; Vinay Setty. Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, 13186, Springer International Publishing, pp.347-354, 2022, Lecture Notes in Computer Science, 978-3-030-99738-0. ⟨10.1007/978-3-030-99739-7_44⟩ (2022)
|
|
BASE
|
|
Show details
|
|
2 |
État de l'art du changement sémantique à partir de plongements contextualisés
|
|
|
|
In: COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference ; https://hal.archives-ouvertes.fr/hal-03320337 ; COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference, Apr 2021, Grenoble (virtuel), France (2021)
|
|
BASE
|
|
Show details
|
|
3 |
Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques
|
|
|
|
In: Conférence en Recherche d'Informations et Applications (CORIA 2021) ; https://hal.archives-ouvertes.fr/hal-03320332 ; Conférence en Recherche d'Informations et Applications (CORIA 2021), ARIA : Association Francophone de Recherche d’Information (RI) et Applications, Apr 2021, Grenoble (virtuel), France. pp.1 - 7 ; http://coria.asso-aria.org/2021/articles/mini_24/main.pdf (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Multilingual Epidemic Event Extraction
|
|
|
|
In: Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings ; https://hal.archives-ouvertes.fr/hal-03480551 ; Hao-Ren Ke; Chei Sian Lee; Kazunari Sugiyama. Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings, 13133, Springer, pp.139-156, 2021, Lecture Notes in Computer Science, 978-3-030-91668-8. ⟨10.1007/978-3-030-91669-5_12⟩ (2021)
|
|
BASE
|
|
Show details
|
|
5 |
Étude comparative de méthodes de classification multilingue appliquées à l'épidémiologie
|
|
|
|
In: COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference ; https://hal.archives-ouvertes.fr/hal-03320343 ; COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference, Apr 2021, Grenoble (virtuel), France (2021)
|
|
BASE
|
|
Show details
|
|
6 |
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
|
|
|
|
In: SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ; https://hal.archives-ouvertes.fr/hal-03418387 ; SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩ (2021)
|
|
BASE
|
|
Show details
|
|
7 |
Dataset for Temporal Analysis of English-French Cognates
|
|
|
|
In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) ; 12th Conference on Language Resources and Evaluation (LREC 2020) ; https://hal.archives-ouvertes.fr/hal-03026957 ; 12th Conference on Language Resources and Evaluation (LREC 2020), May 2020, Marseille, France. pp.855-859, ⟨10.5281/zenodo.3693650⟩ (2020)
|
|
BASE
|
|
Show details
|
|
8 |
A Dataset for Multi-lingual Epidemiological Event Extraction
|
|
|
|
In: Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020) ; https://hal.archives-ouvertes.fr/hal-02732848 ; Proceedings of the 12th Conference on Language Resources and Evaluation (LREC 2020), May 2020, Marseille, France. pp.4139-4144 (2020)
|
|
BASE
|
|
Show details
|
|
9 |
Impact Analysis of Document Digitization on Event Extraction
|
|
|
|
In: CEUR Workshop Proceedings ; 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2020) ; https://hal.archives-ouvertes.fr/hal-03026148 ; 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2020), Nov 2020, Virtual, Italy. pp.17-28 ; http://sag.art.uniroma2.it/NL4AI/ (2020)
|
|
BASE
|
|
Show details
|
|
10 |
Entity Linking for Historical Documents: Challenges and Solutions
|
|
|
|
In: 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020 ; https://hal.archives-ouvertes.fr/hal-03034492 ; 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 12504, Springer, pp.215-231, 2020, Lecture Notes in Computer Science, 978-3-030-64452-9. ⟨10.1007/978-3-030-64452-9_19⟩ (2020)
|
|
BASE
|
|
Show details
|
|
11 |
Robust Named Entity Recognition and Linking on Historical Multilingual Documents
|
|
|
|
In: Conference and Labs of the Evaluation Forum (CLEF 2020) ; https://hal.archives-ouvertes.fr/hal-03026969 ; Conference and Labs of the Evaluation Forum (CLEF 2020), Sep 2020, Thessaloniki, Greece. pp.1-17, ⟨10.5281/zenodo.4068074⟩ ; http://ceur-ws.org/Vol-2696/paper_171.pdf (2020)
|
|
BASE
|
|
Show details
|
|
12 |
Linking Named Entities across Languages using Multilingual Word Embeddings
|
|
|
|
In: JCDL '20: The ACM/IEEE Joint Conference on Digital Libraries in 2020 ; ACM/IEEE Joint Conference on Digital Libraries - JCDL 2020 ; https://hal.archives-ouvertes.fr/hal-03026933 ; ACM/IEEE Joint Conference on Digital Libraries - JCDL 2020, Aug 2020, Wuhan, Hubei - Virtual event, China. pp.329-332, ⟨10.1145/3383583.3398597⟩ ; https://dl.acm.org/doi/10.1145/3383583.3398597 (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Evaluating the Impact of OCR Errors on Topic Modeling
|
|
|
|
In: Maturity and Innovation in Digital Libraries. 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings ; https://hal.archives-ouvertes.fr/hal-03025563 ; Maturity and Innovation in Digital Libraries. 20th International Conference on Asia-Pacific Digital Libraries, ICADL 2018, Hamilton, New Zealand, November 19-22, 2018, Proceedings, pp.3 - 14, 2018, ⟨10.1007/978-3-030-04257-8_1⟩ (2018)
|
|
Abstract:
International audience ; Historical documents pose a challenge for character recognition due to various reasons such as font disparities across different materials, lack of orthographic standards where same words are spelled differently, material quality and unavailability of lexicons of known historical spelling variants. As a result, optical character recognition (OCR) of those documents often yield unsatisfactory OCR accuracy and render digital material only partially discoverable and the data they hold difficult to process. In this paper, we explore the impact of OCR errors on the identification of topics from a corpus comprising text from historical OCRed documents. Based on experiments performed on OCR text corpora, we observe that OCR noise negatively impacts the stability and coherence of topics generated by topic modeling algorithms and we quantify the strength of this impact.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-DL]Computer Science [cs]/Digital Libraries [cs.DL]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Text mining Topic stability; Topic coherence; Topic modeling
|
|
URL: https://doi.org/10.1007/978-3-030-04257-8_1 https://hal.archives-ouvertes.fr/hal-03025563/document https://hal.archives-ouvertes.fr/hal-03025563/file/Mutuvi2018_Chapter_EvaluatingTheImpactOfOCRErrors%281%29.pdf https://hal.archives-ouvertes.fr/hal-03025563
|
|
BASE
|
|
Hide details
|
|
14 |
Every Word has its History: Interactive Exploration and Visualization of Word Sense Evolution
|
|
|
|
In: The 27th ACM International Conference on Information and Knowledge Management (CIKM '18) ; https://hal.archives-ouvertes.fr/hal-03025580 ; The 27th ACM International Conference on Information and Knowledge Management (CIKM '18), Oct 2018, Torino, Italy. pp.1899-1902, ⟨10.1145/3269206.3269218⟩ ; https://www.cikm2018.units.it/ (2018)
|
|
BASE
|
|
Show details
|
|
15 |
Neural Networks for Multi-Word Expression Detection
|
|
|
|
In: Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017) ; https://hal.archives-ouvertes.fr/hal-03025446 ; Proceedings of the 13th Workshop on Multiword Expressions (MWE 2017), Apr 2017, Valencia, Spain. pp.60-65, ⟨10.18653/v1/W17-1707⟩ (2017)
|
|
BASE
|
|
Show details
|
|
|
|