1 |
Assessing the impact of OCR noise on multilingual event detection over digitised documents
|
|
|
|
In: ISSN: 1432-5012 ; EISSN: 1432-1300 ; International Journal on Digital Libraries ; https://hal.archives-ouvertes.fr/hal-03635985 ; International Journal on Digital Libraries, Springer Verlag, 2022, ⟨10.1007/s00799-022-00325-2⟩ (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Computational Measures of Deceptive Language: Prospects and Issues
|
|
|
|
In: ISSN: 2297-900X ; EISSN: 2297-900X ; Frontiers in Communication ; https://hal.archives-ouvertes.fr/hal-03629780 ; Frontiers in Communication, Frontiers, 2022, 7, pp.792378. ⟨10.3389/fcomm.2022.792378⟩ (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Multiword Expression Features for Automatic Hate Speech Detection
|
|
|
|
In: NLDB 2021 - 26th International Conference on Natural Language & Information Systems ; https://hal.archives-ouvertes.fr/hal-03231047 ; NLDB 2021 - 26th International Conference on Natural Language & Information Systems, Jun 2021, Saarbrücken/Virtual, Germany ; http://nldb2021.sb.dfki.de/ (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Hate speech and offensive language detection using transfer learning approaches ; Détection du discours de haine et du langage offensant utilisant des approches de Transfer Learning
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03276023 ; Document and Text Processing. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAS007⟩ (2021)
|
|
BASE
|
|
Show details
|
|
5 |
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
|
|
|
|
In: SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ; https://hal.archives-ouvertes.fr/hal-03418387 ; SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩ (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Impact Analysis of Document Digitization on Event Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Impact Analysis of Document Digitization on Event Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Fine-Grained Implicit Sentiment in Financial News: Uncovering Hidden Bulls and Bears
|
|
|
|
In: Electronics ; Volume 10 ; Issue 20 (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Analyzing Non-Textual Content Elements to Detect Academic Plagiarism
|
|
|
|
Abstract:
Identifying academic plagiarism is a pressing problem, among others, for research institutions, publishers, and funding organizations. Detection approaches proposed so far analyze lexical, syntactical, and semantic text similarity. These approaches find copied, moderately reworded, and literally translated text. However, reliably detecting disguised plagiarism, such as strong paraphrases, sense-for-sense translations, and the reuse of non-textual content and ideas, is an open research problem. The thesis addresses this problem by proposing plagiarism detection approaches that implement a different concept—analyzing non-textual content in academic documents, such as citations, images, and mathematical content. The thesis makes the following research contributions. It provides the most extensive literature review on plagiarism detection technology to date. The study presents the weaknesses of current detection approaches for identifying strongly disguised plagiarism. Moreover, the survey identifies a significant research gap regarding methods that analyze features other than text. Subsequently, the thesis summarizes work that initiated the research on analyzing non-textual content elements to detect academic plagiarism by studying citation patterns in academic documents. To enable plagiarism checks of figures in academic documents, the thesis introduces an image-based detection process that adapts itself to the forms of image similarity typically found in academic work. The process includes established image similarity assessments and newly proposed use-case-specific methods. To improve the identification of plagiarism in disciplines like mathematics, physics, and engineering, the thesis presents the first plagiarism detection approach that analyzes the similarity of mathematical expressions. To demonstrate the benefit of combining non-textual and text-based detection methods, the thesis describes the first plagiarism detection system that integrates the analysis of citation-based, image-based, math-based, and text-based document similarity. The system’s user interface employs visualizations that significantly reduce the effort and time users must invest in examining content similarity. To validate the effectiveness of the proposed detection approaches, the thesis presents five evaluations that use real cases of academic plagiarism and exploratory searches for unknown cases. Real plagiarism is committed by expert researchers with strong incentives to disguise their actions. Therefore, I consider the ability to identify such cases essential for assessing the benefit of any new plagiarism detection approach. The findings of these evaluations are as follows. Citation-based plagiarism detection methods considerably outperformed text-based detection methods in identifying translated, paraphrased, and idea plagiarism instances. Moreover, citation-based detection methods found nine previously undiscovered cases of academic plagiarism. The image-based plagiarism detection process proved effective for identifying frequently observed forms of image plagiarism for image types that authors typically use in academic documents. Math-based plagiarism detection methods reliably retrieved confirmed cases of academic plagiarism involving mathematical content and identified a previously undiscovered case. Math-based detection methods offered advantages for identifying plagiarism cases that text-based methods could not detect, particularly in combination with citation-based detection methods. These results show that non-textual content elements contain a high degree of semantic information, are language-independent, and largely immutable to the alterations that authors typically perform to conceal plagiarism. Analyzing non-textual content complements text-based detection approaches and increases the detection effectiveness, particularly for disguised forms of academic plagiarism. ; published
|
|
Keyword:
Citation Analysis; Content-based Image Retrieval; Data mining; ddc:004; Digital libraries and archives; Document representation; Evaluation of retrieval results; Image search; Information extraction; Information integration; Information Visualization; Link and co-citation analysis; Math Retrieval; Mathematics retrieval; Multilingual and cross-lingual retrieval; Natural Language Processing; Near-duplicate and plagiarism detection; Open Source Software; Plagiarism Detection; Retrieval models and ranking; Surveys and overviews; User Interaction; Users and interactive retrieval; Web searching and information discovery; Web-based interaction
|
|
URL: https://doi.org/10.5281/zenodo.4913345 http://nbn-resolving.de/urn:nbn:de:bsz:352-2-ll951b8bh8s30
|
|
BASE
|
|
Hide details
|
|
10 |
Cross language plagiarism detection with contextualized word embeddings ; Detecção de plágio multilíngue usando word embeddings contextualizadas
|
|
|
|
BASE
|
|
Show details
|
|
12 |
Application-Oriented Approach for Detecting Cyberaggression in Social Media
|
|
|
|
In: International Conference on Applied Human Factors and Ergonomics ; https://hal.archives-ouvertes.fr/hal-02903422 ; International Conference on Applied Human Factors and Ergonomics, Jul 2020, San Diego, United States. pp.129-136, ⟨10.1007/978-3-030-51328-3_19⟩ ; https://link.springer.com/chapter/10.1007%2F978-3-030-51328-3_19 (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Affective behavior modeling on social networks ; Modélisation des sentiments sur les réseaux sociaux
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03339755 ; Social and Information Networks [cs.SI]. Université Montpellier, 2020. English. ⟨NNT : 2020MONTS073⟩ (2020)
|
|
BASE
|
|
Show details
|
|
14 |
Impact Analysis of Document Digitization on Event Extraction
|
|
|
|
In: CEUR Workshop Proceedings ; 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2020) ; https://hal.archives-ouvertes.fr/hal-03026148 ; 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2020), Nov 2020, Virtual, Italy. pp.17-28 ; http://sag.art.uniroma2.it/NL4AI/ (2020)
|
|
BASE
|
|
Show details
|
|
15 |
Detecting deviations from activities of daily living routines using kinect depth maps and power consumption data
|
|
|
|
In: Research outputs 2014 to 2021 (2020)
|
|
BASE
|
|
Show details
|
|
18 |
Sequence Covering for Efficient Host-Based Intrusion Detection
|
|
|
|
In: ISSN: 1556-6013 ; IEEE Transactions on Information Forensics and Security ; https://hal.archives-ouvertes.fr/hal-01653650 ; IEEE Transactions on Information Forensics and Security, Institute of Electrical and Electronics Engineers, 2019, 14 (4), pp.994-1006. ⟨10.1109/TIFS.2018.2868614⟩ ; https://ieeexplore.ieee.org/document/8454473 (2019)
|
|
BASE
|
|
Show details
|
|
19 |
StoryMiner: An Automated and Scalable Framework for Story Analysis and Detection from Social Media
|
|
|
|
BASE
|
|
Show details
|
|
20 |
A novel framework for biomedical entity sense induction
|
|
|
|
In: ISSN: 1532-0464 ; EISSN: 1532-0480 ; Journal of Biomedical Informatics ; https://hal-lirmm.ccsd.cnrs.fr/lirmm-01851988 ; Journal of Biomedical Informatics, Elsevier, 2018, 84, pp.31-41. ⟨10.1016/j.jbi.2018.06.007⟩ (2018)
|
|
BASE
|
|
Show details
|
|
|
|