Home
Catalogue search
Refine your search:
Keyword
Creator / Publisher:
Doucet, Antoine (102)
Boros, Emanuela (36)
Lejeune, Gaël (33)
Laboratoire Informatique, Image et Interaction - EA 2118 (L3I) (25)
Jatowt, Adam (21)
Mutuvi, Stephen (21)
Odeo, Moses (21)
Hamdi, Ahmed (20)
Centre National de la Recherche Scientifique (CNRS)-École Nationale Supérieure d'Ingénieurs de Caen (ENSICAEN) (19)
Groupe de Recherche en Informatique, Image et Instrumentation de Caen (GREYC) (19)
more
Year
Medium
Type
BLLDB-Access:
free (102)
subject to license (0)
Search in the Catalogues and Directories
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
AND
OR
AND NOT
All fields
Title
Creator / Publisher
Keyword
Year
Sort by
creator [A → Z]
'
creator [Z → A]
'
publishing year ↑ (asc)
'
publishing year ↓ (desc)
'
title [A → Z]
'
title [Z → A]
'
Simple Search
Page:
1
2
3
4
5
6
Hits 1 – 20 of 102
1
Assessing the impact of OCR noise on multilingual event detection over digitised documents
Boros, Emanuela
;
Nguyen, Nhu Khoa
;
Lejeune, Gaël
;
Doucet, Antoine
In: ISSN: 1432-5012 ; EISSN: 1432-1300 ; International Journal on Digital Libraries ; https://hal.archives-ouvertes.fr/hal-03635985 ; International Journal on Digital Libraries, Springer Verlag, 2022, ⟨10.1007/s00799-022-00325-2⟩ (2022)
Abstract:
International audience ; Event detection (ED) is a crucial task for natural language processing (NLP) and it involves the identification of instances of specified types of events in text and their classification into event types. The detection of events from digitised documents could enable historians to gather and combine a large amount of information into an integrated whole, a panoramic interpretation of the past. However, the level of degradation of digitised documents and the quality of the optical character recognition (OCR) tools might hinder the performance of an event detection system. While several studies have been performed in detecting events from historical documents, the transcribed documents needed to be hand-validated which implied a great effort of human expertise and manual labor-intensive work. Thus, in this study, we explore the robustness of two different event detection language-independent models to OCR noise, over two datasets that cover different event types and multiple languages. We aim at analysing their ability to mitigate problems caused by the low quality of the digitised documents and we simulate the existence of transcribed data, synthesised from clean annotated text, by injecting synthetic noise. For creating the noisy synthetic data, we chose to utilise four main types of noise that commonly occur after the digitisation process: Character Degradation, Bleed Through, Blur, and Phantom Character. Finally, we conclude that the imbalance of the datasets, the richness of the different annotation styles, and the language characteristics are the most important factors that can influence event detection in digitised documents.
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
;
[INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]
;
[INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]
;
[INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]
;
[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]
;
[INFO.INFO-TT]Computer Science [cs]/Document and Text Processing
;
Digitised Documents
;
Event Detection
;
Information Extraction
URL:
https://hal.archives-ouvertes.fr/hal-03635985/file/IJDL2022-Assessing%20the%20Impact%20of%20OCR%20Noise%20on%20Multilingual%20Event%20Detection%20over%20Digitised%20Documents.pdf
https://doi.org/10.1007/s00799-022-00325-2
https://hal.archives-ouvertes.fr/hal-03635985/document
https://hal.archives-ouvertes.fr/hal-03635985
BASE
Hide details
2
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
Ehrmann, Maud
;
Romanello, Matteo
;
Doucet, Antoine
...
In: Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II ; https://hal.archives-ouvertes.fr/hal-03635971 ; Matthias Hagen; Suzan Verberne; Craig Macdonald; Christin Seifert; Krisztian Balog; Kjetil Nørvåg; Vinay Setty. Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, 13186, Springer International Publishing, pp.347-354, 2022, Lecture Notes in Computer Science, 978-3-030-99738-0. ⟨10.1007/978-3-030-99739-7_44⟩ (2022)
BASE
Show details
3
Assessing the Impact of OCR Noise on Multilingual Event Detection over Digitised Documents ...
Boros, Emanuela
;
Nguyen, Nhu Khoa
;
Lejeune, Gaël
. - : Zenodo, 2022
BASE
Show details
4
HIPE-2022 Shared Task Named Entity Datasets ...
Ehrmann, Maud
;
Romanello, Matteo
;
Doucet, Antoine
. - : Zenodo, 2022
BASE
Show details
5
Event Related Document Retrieval with Multilingual Real World Event Representation ...
Bernard, Guillaume
;
Suire, Cyrille
;
Faucher, Cyril
. - : Zenodo, 2022
BASE
Show details
6
HIPE-2022 Shared Task Named Entity Datasets ...
Ehrmann, Maud
;
Romanello, Matteo
;
Doucet, Antoine
. - : Zenodo, 2022
BASE
Show details
7
Event Related Document Retrieval with Multilingual Real World Event Representation ...
Bernard, Guillaume
;
Suire, Cyrille
;
Faucher, Cyril
. - : Zenodo, 2022
BASE
Show details
8
HIPE-2022 Shared Task Named Entity Datasets ...
Ehrmann, Maud
;
Romanello, Matteo
;
Doucet, Antoine
. - : Zenodo, 2022
BASE
Show details
9
Assessing the Impact of OCR Noise on Multilingual Event Detection over Digitised Documents ...
Boros, Emanuela
;
Nguyen, Nhu Khoa
;
Lejeune, Gaël
. - : Zenodo, 2022
BASE
Show details
10
L3i at SemEval-2022 Task 11: Straightforward Additional Context for Multilingual Named Entity Recognition ...
Boros, Emanuela
;
Carlos-Emiliano Gonzalez-Gallardo
;
Moreno, Jose G.
. - : Zenodo, 2022
BASE
Show details
11
L3i at SemEval-2022 Task 11: Straightforward Additional Context for Multilingual Named Entity Recognition ...
Boros, Emanuela
;
Carlos-Emiliano Gonzalez-Gallardo
;
Moreno, Jose G.
. - : Zenodo, 2022
BASE
Show details
12
HIPE-2022 Shared Task Named Entity Datasets ...
Ehrmann, Maud
;
Romanello, Matteo
;
Doucet, Antoine
. - : Zenodo, 2022
BASE
Show details
13
HIPE-2022 Shared Task Named Entity Datasets
Ehrmann, Maud
;
Romanello, Matteo
;
Doucet, Antoine
...
In: http://infoscience.epfl.ch/record/292174 (2022)
BASE
Show details
14
État de l'art du changement sémantique à partir de plongements contextualisés
Montariol, Syrielle
;
Doucet, Antoine
;
Allauzen, Alexandre
In: COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference ; https://hal.archives-ouvertes.fr/hal-03320337 ; COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference, Apr 2021, Grenoble (virtuel), France (2021)
BASE
Show details
15
L3i_LBPAM at the FinSim-2 task: Learning Financial Semantic Similarities with Siamese Transformers
Nguyen, Nhu Khoa
;
Boros, Emanuela
;
Lejeune, Gaël
...
In: WWW '21: Companion Proceedings of the Web Conference 2021 ; WWW '21: The Web Conference 2021 ; https://hal.sorbonne-universite.fr/hal-03256324 ; WWW '21: The Web Conference 2021, Apr 2021, Ljubljana (virtual), Slovenia. pp.302-306, ⟨10.1145/3442442.3451384⟩ (2021)
BASE
Show details
16
Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques
Boros, Emanuela
;
Hamdi, Ahmed
;
Linhares Pontes, Elvys
...
In: Conférence en Recherche d'Informations et Applications (CORIA 2021) ; https://hal.archives-ouvertes.fr/hal-03320332 ; Conférence en Recherche d'Informations et Applications (CORIA 2021), ARIA : Association Francophone de Recherche d’Information (RI) et Applications, Apr 2021, Grenoble (virtuel), France. pp.1 - 7 ; http://coria.asso-aria.org/2021/articles/mini_24/main.pdf (2021)
BASE
Show details
17
Multilingual Epidemic Event Extraction
Mutuvi, Stephen
;
Boros, Emanuela
;
Doucet, Antoine
...
In: Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings ; https://hal.archives-ouvertes.fr/hal-03480551 ; Hao-Ren Ke; Chei Sian Lee; Kazunari Sugiyama. Towards Open and Trustworthy Digital Societies. 23rd International Conference on Asia-Pacific Digital Libraries, ICADL 2021, Virtual Event, December 1–3, 2021, Proceedings, 13133, Springer, pp.139-156, 2021, Lecture Notes in Computer Science, 978-3-030-91668-8. ⟨10.1007/978-3-030-91669-5_12⟩ (2021)
BASE
Show details
18
Étude comparative de méthodes de classification multilingue appliquées à l'épidémiologie
Mutuvi, Stephen
;
Boros, Emanuela
;
Doucet, Antoine
...
In: COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference ; https://hal.archives-ouvertes.fr/hal-03320343 ; COnférence en Recherche d'Informations et Applications - CORIA 2021, French Information Retrieval Conference, Apr 2021, Grenoble (virtuel), France (2021)
BASE
Show details
19
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
Hamdi, Ahmed
;
Linhares Pontes, Elvys
;
Boros, Emanuela
...
In: SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ; https://hal.archives-ouvertes.fr/hal-03418387 ; SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩ (2021)
BASE
Show details
20
Event Related Document Retrieval with Multilingual Real World Event Representation
Bernard, Guillaume
;
Suire, Cyrille
;
Faucher, Cyril
...
In: 20th International Semantic Web Conference ; https://hal.archives-ouvertes.fr/hal-03415957 ; 20th International Semantic Web Conference, Oct 2021, Online, France ; https://iswc2021.semanticweb.org/ (2021)
BASE
Show details
Page:
1
2
3
4
5
6
Mobile view
All
Catalogues
UB Frankfurt Linguistik
0
IDS Mannheim
0
OLC Linguistik
2
UB Frankfurt Retrokatalog
0
DNB Subject Category Language
0
Institut für Empirische Sprachwissenschaft
0
Leibniz-Centre General Linguistics (ZAS)
0
Bibliographies
BLLDB
4
BDSL
0
IDS Bibliografie zur deutschen Grammatik
0
IDS Bibliografie zur Gesprächsforschung
0
IDS Konnektoren im Deutschen
0
IDS Präpositionen im Deutschen
0
IDS OBELEX meta
0
MPI-SHH Linguistics Collection
0
MPI for Psycholinguistics
0
Linked Open Data catalogues
Annohub
0
Online resources
Link directory
0
Journal directory
0
Database directory
0
Dictionary directory
0
Open access documents
BASE
98
Linguistik-Repository
0
IDS Publikationsserver
0
Online dissertations
0
Language Description Heritage
0
© 2013 - 2024 Lin|gu|is|tik
|
Imprint
|
Privacy Policy
|
Datenschutzeinstellungen ändern