1 |
Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques
|
|
|
|
In: Conférence en Recherche d'Informations et Applications (CORIA 2021) ; https://hal.archives-ouvertes.fr/hal-03320332 ; Conférence en Recherche d'Informations et Applications (CORIA 2021), ARIA : Association Francophone de Recherche d’Information (RI) et Applications, Apr 2021, Grenoble (virtuel), France. pp.1 - 7 ; http://coria.asso-aria.org/2021/articles/mini_24/main.pdf (2021)
|
|
BASE
|
|
Show details
|
|
2 |
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
|
|
|
|
In: SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ; https://hal.archives-ouvertes.fr/hal-03418387 ; SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩ (2021)
|
|
BASE
|
|
Show details
|
|
3 |
MELHISSA: a multilingual entity linking architecture for historical press articles ...
|
|
|
|
BASE
|
|
Show details
|
|
4 |
MELHISSA: a multilingual entity linking architecture for historical press articles ...
|
|
|
|
BASE
|
|
Show details
|
|
5 |
Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers ...
|
|
|
|
BASE
|
|
Show details
|
|
6 |
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Annotation Guidelines for Named Entity Recognition, Entity Linking and Stance Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Annotation Guidelines for Named Entity Recognition, Entity Linking and Stance Detection ...
|
|
|
|
BASE
|
|
Show details
|
|
10 |
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers ...
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Entity Linking for Historical Documents: Challenges and Solutions
|
|
|
|
In: 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020 ; https://hal.archives-ouvertes.fr/hal-03034492 ; 22nd International Conference on Asia-Pacific Digital Libraries, ICADL 2020, 12504, Springer, pp.215-231, 2020, Lecture Notes in Computer Science, 978-3-030-64452-9. ⟨10.1007/978-3-030-64452-9_19⟩ (2020)
|
|
BASE
|
|
Show details
|
|
12 |
Robust Named Entity Recognition and Linking on Historical Multilingual Documents
|
|
|
|
In: Conference and Labs of the Evaluation Forum (CLEF 2020) ; https://hal.archives-ouvertes.fr/hal-03026969 ; Conference and Labs of the Evaluation Forum (CLEF 2020), Sep 2020, Thessaloniki, Greece. pp.1-17, ⟨10.5281/zenodo.4068074⟩ ; http://ceur-ws.org/Vol-2696/paper_171.pdf (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Robust Named Entity Recognition and Linking on Historical Multilingual Documents ...
|
|
|
|
BASE
|
|
Show details
|
|
14 |
Robust Named Entity Recognition and Linking on Historical Multilingual Documents ...
|
|
|
|
BASE
|
|
Show details
|
|
15 |
Benchmark for the evaluation of named entity recognition over ancient documents ...
|
|
|
|
BASE
|
|
Show details
|
|
16 |
Robust Named Entity Recognition and Linking on Historical Multilingual Documents ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Benchmark for the evaluation of named entity recognition over ancient documents ...
|
|
|
|
Abstract:
The dataset consists of a multilingual noisy corpora for named entity recognition (NER). The noisy versions are simulated from the CoNLL-02 (Spanish and Dutch) and CoNLL-03 (English) NER corpora. The original collections are re-OCRed and four types of noises at two different levels are added in order to simulate various OCR output. More precisely, we first extracted raw texts and converted them into images. These images have been contaminated by adding some common noises when using a scanner. We further extract OCRed data using tesseract open source OCR engine v-3.04.01. Consequently to the image noise insertions, OCRed data contains degradations. Original and noisy texts are finally aligned. This archive contains three folders (one per language). The folders contain the degraded images, the noisy texts extracted by the OCR and their aligned version with clean data. These are the supplementary materials for the TPDL 2020 paper Assessing and minimizing the impact of OCR quality on named entity recognition. If ...
|
|
Keyword:
OCR, named entity recognition, noisy, degradation
|
|
URL: https://dx.doi.org/10.5281/zenodo.3877553 https://zenodo.org/record/3877553
|
|
BASE
|
|
Hide details
|
|
18 |
Robust Named Entity Recognition and Linking on Historical Multilingual Documents ...
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Alleviating Digitization Errors in Named Entity Recognition for Historical Documents ...
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Alleviating Digitization Errors in Named Entity Recognition for Historical Documents ...
|
|
|
|
BASE
|
|
Show details
|
|
|
|