DE eng

Search in the Catalogues and Directories

Page: 1 2 3 4 5...47
Hits 1 – 20 of 922

1
Towards combined semantic and lexical scores based on a new representation of textual data to extract experimental data from scientific publications
In: ISSN: 1751-5858 ; EISSN: 1751-5866 ; International Journal of Intelligent Information and Database Systems ; https://hal.inrae.fr/hal-03616243 ; International Journal of Intelligent Information and Database Systems, Inderscience, 2022, 15 (1), pp.78. ⟨10.1504/IJIIDS.2022.120146⟩ (2022)
BASE
Show details
2
Assessing the impact of OCR noise on multilingual event detection over digitised documents
In: ISSN: 1432-5012 ; EISSN: 1432-1300 ; International Journal on Digital Libraries ; https://hal.archives-ouvertes.fr/hal-03635985 ; International Journal on Digital Libraries, Springer Verlag, 2022, ⟨10.1007/s00799-022-00325-2⟩ (2022)
BASE
Show details
3
Introducing the HIPE 2022 Shared Task: Named Entity Recognition and Linking in Multilingual Historical Documents
In: Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II ; https://hal.archives-ouvertes.fr/hal-03635971 ; Matthias Hagen; Suzan Verberne; Craig Macdonald; Christin Seifert; Krisztian Balog; Kjetil Nørvåg; Vinay Setty. Advances in Information Retrieval. 44th European Conference on IR Research, ECIR 2022, Stavanger, Norway, April 10–14, 2022, Proceedings, Part II, 13186, Springer International Publishing, pp.347-354, 2022, Lecture Notes in Computer Science, 978-3-030-99738-0. ⟨10.1007/978-3-030-99739-7_44⟩ (2022)
BASE
Show details
4
HIPE-2022 Shared Task Named Entity Datasets ...
BASE
Show details
5
HIPE-2022 Shared Task Named Entity Datasets ...
BASE
Show details
6
HIPE-2022 Shared Task Named Entity Datasets ...
BASE
Show details
7
HIPE-2022 Shared Task Named Entity Datasets ...
BASE
Show details
8
Text Mining from Free Unstructured Text: An Experiment of Time Series Retrieval for Volcano Monitoring
In: Applied Sciences; Volume 12; Issue 7; Pages: 3503 (2022)
BASE
Show details
9
Sentence Boundary Extraction from Scientific Literature of Electric Double Layer Capacitor Domain: Tools and Techniques
In: Applied Sciences; Volume 12; Issue 3; Pages: 1352 (2022)
Abstract: Given the growth of scientific literature on the web, particularly material science, acquiring data precisely from the literature has become more significant. Material information systems, or chemical information systems, play an essential role in discovering data, materials, or synthesis processes using the existing scientific literature. Processing and understanding the natural language of scientific literature is the backbone of these systems, which depend heavily on appropriate textual content. Appropriate textual content means a complete, meaningful sentence from a large chunk of textual content. The process of detecting the beginning and end of a sentence and extracting them as correct sentences is called sentence boundary extraction. The accurate extraction of sentence boundaries from PDF documents is essential for readability and natural language processing. Therefore, this study provides a comparative analysis of different tools for extracting PDF documents into text, which are available as Python libraries or packages and are widely used by the research community. The main objective is to find the most suitable technique among the available techniques that can correctly extract sentences from PDF files as text. The performance of the used techniques Pypdf2, Pdfminer.six, Pymupdf, Pdftotext, Tika, and Grobid is presented in terms of precision, recall, f-1 score, run time, and memory consumption. NLTK, Spacy, and Gensim Natural Language Processing (NLP) tools are used to identify sentence boundaries. Of all the techniques studied, the Grobid PDF extraction package using the NLP tool Spacy achieved the highest f-1 score of 93% and consumed the least amount of memory at 46.13 MegaBytes.
Keyword: gensim; material informatics; material information system; Materials 4.0; NLP in material science; NLTK; PDF to text conversion; sentence boundary extraction; Spacy
URL: https://doi.org/10.3390/app12031352
BASE
Hide details
10
Analysis of the Full-Size Russian Corpus of Internet Drug Reviews with Complex NER Labeling Using Deep Learning Neural Networks and Language Models
In: Applied Sciences; Volume 12; Issue 1; Pages: 491 (2022)
BASE
Show details
11
Experiences on the Improvement of Logic-Based Anaphora Resolution in English Texts
In: Electronics; Volume 11; Issue 3; Pages: 372 (2022)
BASE
Show details
12
Semantic pattern discovery in open information extraction
Chauhan, Aabhas. - 2022
BASE
Show details
13
Topic models do not model topics: epistemological remarks and steps towards best practices
In: EISSN: 2416-5999 ; Journal of Data Mining and Digital Humanities ; https://hal.archives-ouvertes.fr/hal-03261599 ; Journal of Data Mining and Digital Humanities, Episciences.org, 2021, 2021, ⟨10.46298/jdmdh.7595⟩ (2021)
BASE
Show details
14
Indirectly Named Entity Recognition ; Reconnaissance d'entités indirectement nommées
In: ISSN: 2530-9455 ; Journal of Computer-Assisted Linguistic Research (JCLR) ; https://hal.archives-ouvertes.fr/hal-03476411 ; Journal of Computer-Assisted Linguistic Research (JCLR), Universitat Politècnica de València, 2021, 5 (1), pp.27-46. ⟨10.4995/JCLR.2021.15922⟩ ; https://polipapers.upv.es/index.php/jclr/index (2021)
BASE
Show details
15
Atténuer les erreurs de numérisation dans la reconnaissance d'entités nommées pour les documents historiques
In: Conférence en Recherche d'Informations et Applications (CORIA 2021) ; https://hal.archives-ouvertes.fr/hal-03320332 ; Conférence en Recherche d'Informations et Applications (CORIA 2021), ARIA : Association Francophone de Recherche d’Information (RI) et Applications, Apr 2021, Grenoble (virtuel), France. pp.1 - 7 ; http://coria.asso-aria.org/2021/articles/mini_24/main.pdf (2021)
BASE
Show details
16
WEIR-P: An Information Extraction Pipeline for the Wastewater Domain
In: RCIS 2021 - 5th International Conference on Research Challenges in Information Science ; https://hal.archives-ouvertes.fr/hal-03211461 ; RCIS 2021 - 5th International Conference on Research Challenges in Information Science, May 2021, Virtual, Cyprus (2021)
BASE
Show details
17
Mapping the evolution of topics published by Education for Information. Interdisciplinary Journal of Information Studies
In: ISSN: 0167-8329 ; Education for Information ; https://hal.archives-ouvertes.fr/hal-03392553 ; Education for Information, IOS Press, 2021 (2021)
BASE
Show details
18
LILLIE : information extraction and database integration using linguistics and learning-based algorithms ...
BASE
Show details
19
Exploring Construction of a Company Domain-Specific Knowledge Graph from Financial Texts Using Hybrid Information Extraction
Jen, Chun-Heng. - : KTH, Skolan för elektroteknik och datavetenskap (EECS), 2021
BASE
Show details
20
Arabic question answering system: a survey
Azmi, Aqil M.; Cambria, Erik; Hussain, Amir. - : Springer, 2021
BASE
Show details

Page: 1 2 3 4 5...47

Catalogues
27
5
0
0
0
0
2
Bibliographies
10
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
887
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern