1 |
The Ontology of Modernist Character: Deconstructing the Human in the British Novel, 1899-1934 ...
|
|
|
|
BASE
|
|
Show details
|
|
2 |
The Impact of Game Elements on Learner Motivation: Influence of Initial Motivation and Player Profile
|
|
|
|
In: EISSN: 1939-1382 ; IEEE Transactions on Learning Technologies ; https://hal.univ-lyon2.fr/hal-03579428 ; IEEE Transactions on Learning Technologies, Institute of Electrical and Electronics Engineers, In press, ⟨10.1109/TLT.2022.3153239⟩ (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Vocal Expression of Affective States in Spontaneous Laughter reveals the Bright and the Dark Side of Laughter
|
|
|
|
BASE
|
|
Show details
|
|
4 |
How Spanish speakers express norms using generic person markers
|
|
|
|
In: Psychology Faculty Research and Scholarship (2022)
|
|
BASE
|
|
Show details
|
|
5 |
Political economy of smart cities and the Human Rights: from corporative technocracy to sensibility
|
|
|
|
In: Revista de Direito Internacional; v. 19, n. 1 (2022): International Law and climate litigation ; 2237-1036 ; 2236-997X (2022)
|
|
BASE
|
|
Show details
|
|
6 |
An Overview of Indian Spoken Language Recognition from Machine Learning Perspective
|
|
|
|
In: ISSN: 2375-4699 ; EISSN: 2375-4702 ; ACM Transactions on Asian and Low-Resource Language Information Processing ; https://hal.inria.fr/hal-03616853 ; ACM Transactions on Asian and Low-Resource Language Information Processing, ACM, In press, ⟨10.1145/3523179⟩ (2022)
|
|
BASE
|
|
Show details
|
|
7 |
Speech Perception and Implementation in a Virtual Medical Assistant
|
|
|
|
In: 6. ICAART – 14th International Conference on Agents and Artificial Intelligence ; https://hal.archives-ouvertes.fr/hal-03621550 ; 6. ICAART – 14th International Conference on Agents and Artificial Intelligence, Feb 2022, Vienna, Austria (2022)
|
|
BASE
|
|
Show details
|
|
8 |
A Novel Multimodal Approach for Studying the Dynamics of Curiosity in Small Group Learning
|
|
|
|
In: https://hal.inria.fr/hal-03536340 ; 2022 (2022)
|
|
BASE
|
|
Show details
|
|
9 |
Barriers and Facilitators to the Implementation of a Community Doula Program for Black and Pacific Islander Pregnant People in San Francisco: Findings from a Partnered Process Evaluation.
|
|
|
|
In: Maternal and child health journal, vol 26, iss 4 (2022)
|
|
BASE
|
|
Show details
|
|
10 |
From Disrupted Classrooms to Human-Machine Collaboration? The Pocket Calculator, Google Translate, and the Future of Language Education
|
|
|
|
In: L2 Journal, vol 14, iss 1 (2022)
|
|
BASE
|
|
Show details
|
|
11 |
Speaking Style Variability in Speaker Discrimination by Humans and Machines
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Computational models of disfluencies : fillers and discourse markers in spoken language understanding ; Modèles computationnels des disfluences dans le traitement de la parole
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03653211 ; Computer science. Institut Polytechnique de Paris, 2022. English. ⟨NNT : 2022IPPAT001⟩ (2022)
|
|
BASE
|
|
Show details
|
|
13 |
Genetic Neural Architecture Search for automatic assessment of human sperm images
|
|
|
|
In: ISSN: 0957-4174 ; Expert Systems with Applications ; https://hal.archives-ouvertes.fr/hal-03585035 ; Expert Systems with Applications, Elsevier, 2022 (2022)
|
|
BASE
|
|
Show details
|
|
14 |
Assessing the impact of OCR noise on multilingual event detection over digitised documents
|
|
|
|
In: ISSN: 1432-5012 ; EISSN: 1432-1300 ; International Journal on Digital Libraries ; https://hal.archives-ouvertes.fr/hal-03635985 ; International Journal on Digital Libraries, Springer Verlag, 2022, ⟨10.1007/s00799-022-00325-2⟩ (2022)
|
|
Abstract:
International audience ; Event detection (ED) is a crucial task for natural language processing (NLP) and it involves the identification of instances of specified types of events in text and their classification into event types. The detection of events from digitised documents could enable historians to gather and combine a large amount of information into an integrated whole, a panoramic interpretation of the past. However, the level of degradation of digitised documents and the quality of the optical character recognition (OCR) tools might hinder the performance of an event detection system. While several studies have been performed in detecting events from historical documents, the transcribed documents needed to be hand-validated which implied a great effort of human expertise and manual labor-intensive work. Thus, in this study, we explore the robustness of two different event detection language-independent models to OCR noise, over two datasets that cover different event types and multiple languages. We aim at analysing their ability to mitigate problems caused by the low quality of the digitised documents and we simulate the existence of transcribed data, synthesised from clean annotated text, by injecting synthetic noise. For creating the noisy synthetic data, we chose to utilise four main types of noise that commonly occur after the digitisation process: Character Degradation, Bleed Through, Blur, and Phantom Character. Finally, we conclude that the imbalance of the datasets, the richness of the different annotation styles, and the language characteristics are the most important factors that can influence event detection in digitised documents.
|
|
Keyword:
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Digitised Documents; Event Detection; Information Extraction
|
|
URL: https://hal.archives-ouvertes.fr/hal-03635985/file/IJDL2022-Assessing%20the%20Impact%20of%20OCR%20Noise%20on%20Multilingual%20Event%20Detection%20over%20Digitised%20Documents.pdf https://doi.org/10.1007/s00799-022-00325-2 https://hal.archives-ouvertes.fr/hal-03635985/document https://hal.archives-ouvertes.fr/hal-03635985
|
|
BASE
|
|
Hide details
|
|
15 |
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
|
|
|
|
In: Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II ; https://hal.archives-ouvertes.fr/hal-03635971 ; Matthias Hagen; Suzan Verberne; Craig Macdonald; Christin Seifert; Krisztian Balog; Kjetil Nørvåg; Vinay Setty. Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, 13186, Springer International Publishing, pp.347-354, 2022, Lecture Notes in Computer Science, 978-3-030-99738-0. ⟨10.1007/978-3-030-99739-7_44⟩ (2022)
|
|
BASE
|
|
Show details
|
|
16 |
European Language Equality - Report on the French Language
|
|
|
|
In: https://hal.archives-ouvertes.fr/hal-03637776 ; [Research Report] CNRS - LISN. 2022 (2022)
|
|
BASE
|
|
Show details
|
|
17 |
Neural MT and Human Post-editing : a Method to Improve Editorial Quality
|
|
|
|
In: ISSN: 1134-8941 ; Interlingüística ; https://halshs.archives-ouvertes.fr/halshs-03603590 ; Interlingüística, Alacant [Spain] : Universitat Autònoma de Barcelona, 2022, pp.15-36 (2022)
|
|
BASE
|
|
Show details
|
|
18 |
Genetic continuity of Indo-Iranian speakers since the Iron Age in southern Central Asia
|
|
|
|
In: ISSN: 2045-2322 ; EISSN: 2045-2322 ; Scientific Reports ; https://hal.archives-ouvertes.fr/hal-03566556 ; Scientific Reports, Nature Publishing Group, 2022, 12, pp.733. ⟨10.1038/s41598-021-04144-4⟩ (2022)
|
|
BASE
|
|
Show details
|
|
19 |
Impact of gender transition on sexuality and diversity of practices ; Impact of gender transition on sexuality and diversity of practices: a qualitative analysis of reddit discussions
|
|
|
|
In: European Society for Sexual Medicine ESSM Congress 2022 ; https://hal.archives-ouvertes.fr/hal-03581962 ; European Society for Sexual Medicine ESSM Congress 2022, Feb 2022, Rotterdam, Netherlands. 2022 (2022)
|
|
BASE
|
|
Show details
|
|
20 |
Surnames in south-eastern France: structure of the rural population during the 19th century through isonymy
|
|
|
|
In: ISSN: 0021-9320 ; EISSN: 1469-7599 ; Journal of Biosocial Science ; https://hal.archives-ouvertes.fr/hal-03521816 ; Journal of Biosocial Science, Cambridge University Press (CUP), In press, ⟨10.1017/S0021932021000699⟩ (2022)
|
|
BASE
|
|
Show details
|
|
|
|