Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Page: 1 2 3 4 5...257

Hits 1 – 20 of 5.129

1	ViQuAE, a Dataset for Knowledge-based Visual Question Answering about Named Entities
	Lerner, Paul; Ferret, Olivier; Guinaudeau, Camille...
	In: ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR ’22) ; https://hal-universite-paris-saclay.archives-ouvertes.fr/hal-03650618 ; 2022 (2022)
	BASE
	Show details

2	Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications
	Lentschat, Martin; Buche, Patrice; Dibie-Barthelemy, Juliette...
	In: ISSN: 1751-5858 ; EISSN: 1751-5866 ; International Journal of Intelligent Information and Database Systems ; https://hal.inrae.fr/hal-03616243 ; International Journal of Intelligent Information and Database Systems, Inderscience, 2022, 15 (1), pp.78. ⟨10.1504/IJIIDS.2022.120146⟩ (2022)
	BASE
	Show details

3	Obvie: interface web pour la fouille et la comparaison de textes
	Alrahabi, Motasem
	In: Atelier DigitAl Humanities and cuLtural herItAge: data and knowledge management and analysis durant la conférence francophone sur l'Extraction et la Gestion des Connaissances (egc2022) ; https://hal.archives-ouvertes.fr/hal-03543362 ; Atelier DigitAl Humanities and cuLtural herItAge: data and knowledge management and analysis durant la conférence francophone sur l'Extraction et la Gestion des Connaissances (egc2022), Jan 2022, Blois, France ; https://egc2022.univ-tours.fr/ateliers/ (2022)
	BASE
	Show details

4	Preprint Citation Praxis in PLOS
	Bertin, Marc; Atanassova, Iana
	In: ISSN: 0138-9130 ; EISSN: 1588-2861 ; Scientometrics ; https://hal.archives-ouvertes.fr/hal-03506094 ; In press (2022)
	BASE
	Show details

5	Islands and Bridges of Language: Bio-Inspired Structural Analysis of Language Embedding Data
	Zhou, Hongwei. - : eScholarship, University of California, 2022
	BASE
	Show details

6	Assessing the impact of OCR noise on multilingual event detection over digitised documents
	Boros, Emanuela; Nguyen, Nhu Khoa; Lejeune, Gaël; Doucet, Antoine
	In: ISSN: 1432-5012 ; EISSN: 1432-1300 ; International Journal on Digital Libraries ; https://hal.archives-ouvertes.fr/hal-03635985 ; International Journal on Digital Libraries, Springer Verlag, 2022, ⟨10.1007/s00799-022-00325-2⟩ (2022)
	Abstract: International audience ; Event detection (ED) is a crucial task for natural language processing (NLP) and it involves the identification of instances of specified types of events in text and their classification into event types. The detection of events from digitised documents could enable historians to gather and combine a large amount of information into an integrated whole, a panoramic interpretation of the past. However, the level of degradation of digitised documents and the quality of the optical character recognition (OCR) tools might hinder the performance of an event detection system. While several studies have been performed in detecting events from historical documents, the transcribed documents needed to be hand-validated which implied a great effort of human expertise and manual labor-intensive work. Thus, in this study, we explore the robustness of two different event detection language-independent models to OCR noise, over two datasets that cover different event types and multiple languages. We aim at analysing their ability to mitigate problems caused by the low quality of the digitised documents and we simulate the existence of transcribed data, synthesised from clean annotated text, by injecting synthetic noise. For creating the noisy synthetic data, we chose to utilise four main types of noise that commonly occur after the digitisation process: Character Degradation, Bleed Through, Blur, and Phantom Character. Finally, we conclude that the imbalance of the datasets, the richness of the different annotation styles, and the language characteristics are the most important factors that can influence event detection in digitised documents.
	Keyword: [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]; [INFO.INFO-CL]Computer Science [cs]/Computation and Language [cs.CL]; [INFO.INFO-HC]Computer Science [cs]/Human-Computer Interaction [cs.HC]; [INFO.INFO-IR]Computer Science [cs]/Information Retrieval [cs.IR]; [INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG]; [INFO.INFO-TT]Computer Science [cs]/Document and Text Processing; Digitised Documents; Event Detection; Information Extraction
	URL: https://hal.archives-ouvertes.fr/hal-03635985/file/IJDL2022-Assessing%20the%20Impact%20of%20OCR%20Noise%20on%20Multilingual%20Event%20Detection%20over%20Digitised%20Documents.pdf https://doi.org/10.1007/s00799-022-00325-2 https://hal.archives-ouvertes.fr/hal-03635985/document https://hal.archives-ouvertes.fr/hal-03635985
	BASE
	Hide details

7	Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
	Ehrmann, Maud; Romanello, Matteo; Doucet, Antoine...
	In: Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II ; https://hal.archives-ouvertes.fr/hal-03635971 ; Matthias Hagen; Suzan Verberne; Craig Macdonald; Christin Seifert; Krisztian Balog; Kjetil Nørvåg; Vinay Setty. Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, 13186, Springer International Publishing, pp.347-354, 2022, Lecture Notes in Computer Science, 978-3-030-99738-0. ⟨10.1007/978-3-030-99739-7_44⟩ (2022)
	BASE
	Show details

8	Can Character-based Language Models Improve Downstream Task Performance in Low-Resource and Noisy Language Scenarios?
	Riabi, Arij; Sagot, Benoît; Seddah, Djamé
	In: Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021) ; https://hal.inria.fr/hal-03527328 ; Seventh Workshop on Noisy User-generated Text (W-NUT 2021, colocated with EMNLP 2021), Jan 2022, punta cana, Dominican Republic ; https://aclanthology.org/2021.wnut-1.47/ (2022)
	BASE
	Show details

9	Between History and Natural Language Processing: Study, Enrichment and Online Publication of French Parliamentary Debates of the Early Third Republic (1881-1899)
	Puren, Marie; Bourgeois, Nicolas; Pellet, Aurélien...
	In: ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora ; https://hal.archives-ouvertes.fr/hal-03623351 ; ParlaCLARIN III at LREC2022 - Workshop on Creating, Enriching and Using Parliamentary Corpora, Jun 2022, Marseille, France ; https://www.clarin.eu/ParlaCLARIN-III (2022)
	BASE
	Show details

10	A Dataset for Toponym Resolution in Nineteenth-Century English Newspapers
	McDonough, Katherine; Wilson, Daniel C. S.; Lawrence, Jon...
	In: Journal of Open Humanities Data; Vol 8 (2022); 3 ; 2059-481X (2022)
	BASE
	Show details

11	Cross-media Scientific Research Achievements Query based on Ranking Learning ...
	Wang, Benzhi; Liang, Meiyu; Li, Ang. - : arXiv, 2022
	BASE
	Show details

12	Exploring Sub-skeleton Trajectories for Interpretable Recognition of Sign Language ...
	Gudmundsson, Joachim; Seybold, Martin P.; Pfeifer, John. - : arXiv, 2022
	BASE
	Show details

13	Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers ...
	Vitiugin, Fedor; Castillo, Carlos. - : arXiv, 2022
	BASE
	Show details

14	Simplifying Multilingual News Clustering Through Projection From a Shared Space ...
	Santos, João; Mendes, Afonso; Miranda, Sebastião. - : arXiv, 2022
	BASE
	Show details

15	Towards Best Practices for Training Multilingual Dense Retrieval Models ...
	Zhang, Xinyu; Ogueji, Kelechi; Ma, Xueguang. - : arXiv, 2022
	BASE
	Show details

16	Addressing Issues of Cross-Linguality in Open-Retrieval Question Answering Systems For Emergent Domains ...
	Albalak, Alon; Levy, Sharon; Wang, William Yang. - : arXiv, 2022
	BASE
	Show details

17	C3: Continued Pretraining with Contrastive Weak Supervision for Cross Language Ad-Hoc Retrieval ...
	Yang, Eugene; Nair, Suraj; Chandradevan, Ramraj. - : arXiv, 2022
	BASE
	Show details

18	Parameter-Efficient Neural Reranking for Cross-Lingual and Multilingual Retrieval ...
	Litschko, Robert; Vulić, Ivan; Glavaš, Goran. - : arXiv, 2022
	BASE
	Show details

19	QALD-9-plus: A Multilingual Dataset for Question Answering over DBpedia and Wikidata Translated by Native Speakers ...
	Perevalov, Aleksandr; Diefenbach, Dennis; Usbeck, Ricardo. - : arXiv, 2022
	BASE
	Show details

20	MuMiN: A Large-Scale Multilingual Multimodal Fact-Checked Misinformation Social Network Dataset ...
	Nielsen, Dan Saattrup; McConville, Ryan. - : arXiv, 2022
	BASE
	Show details

Page: 1 2 3 4 5...257

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern