1 |
Assessing the impact of OCR noise on multilingual event detection over digitised documents
|
|
|
|
In: ISSN: 1432-5012 ; EISSN: 1432-1300 ; International Journal on Digital Libraries ; https://hal.archives-ouvertes.fr/hal-03635985 ; International Journal on Digital Libraries, Springer Verlag, 2022, ⟨10.1007/s00799-022-00325-2⟩ (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Computational Measures of Deceptive Language: Prospects and Issues
|
|
|
|
In: ISSN: 2297-900X ; EISSN: 2297-900X ; Frontiers in Communication ; https://hal.archives-ouvertes.fr/hal-03629780 ; Frontiers in Communication, Frontiers, 2022, 7, pp.792378. ⟨10.3389/fcomm.2022.792378⟩ (2022)
|
|
BASE
|
|
Show details
|
|
3 |
Multiword Expression Features for Automatic Hate Speech Detection
|
|
|
|
In: NLDB 2021 - 26th International Conference on Natural Language & Information Systems ; https://hal.archives-ouvertes.fr/hal-03231047 ; NLDB 2021 - 26th International Conference on Natural Language & Information Systems, Jun 2021, Saarbrücken/Virtual, Germany ; http://nldb2021.sb.dfki.de/ (2021)
|
|
BASE
|
|
Show details
|
|
4 |
Hate speech and offensive language detection using transfer learning approaches ; Détection du discours de haine et du langage offensant utilisant des approches de Transfer Learning
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03276023 ; Document and Text Processing. Institut Polytechnique de Paris, 2021. English. ⟨NNT : 2021IPPAS007⟩ (2021)
|
|
BASE
|
|
Show details
|
|
5 |
A Multilingual Dataset for Named Entity Recognition, Entity Linking and Stance Detection in Historical Newspapers
|
|
|
|
In: SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval ; https://hal.archives-ouvertes.fr/hal-03418387 ; SIGIR '21: The 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Jul 2021, Virtual Event, Canada. pp.2328-2334, ⟨10.1145/3404835.3463255⟩ (2021)
|
|
BASE
|
|
Show details
|
|
6 |
Impact Analysis of Document Digitization on Event Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
7 |
Impact Analysis of Document Digitization on Event Extraction ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
Fine-Grained Implicit Sentiment in Financial News: Uncovering Hidden Bulls and Bears
|
|
|
|
In: Electronics ; Volume 10 ; Issue 20 (2021)
|
|
BASE
|
|
Show details
|
|
9 |
Analyzing Non-Textual Content Elements to Detect Academic Plagiarism
|
|
|
|
BASE
|
|
Show details
|
|
10 |
Cross language plagiarism detection with contextualized word embeddings ; Detecção de plágio multilíngue usando word embeddings contextualizadas
|
|
|
|
BASE
|
|
Show details
|
|
11 |
Web mining for social network analysis
|
|
|
|
Abstract:
Undoubtedly, the rapid development of information systems and the widespread use of electronic means and social networks have played a significant role in accelerating the pace of events worldwide, such as, in the 2012 Gaza conflict (the 8-day war), in the pro-secessionist rebellion in the 2013-2014 conflict in Eastern Ukraine, in the 2016 US Presidential elections, and in conjunction with the COVID-19 outbreak pandemic since the beginning of 2020. As the number of daily shared data grows quickly on various social networking platforms in different languages, techniques to carry out automatic classification of this huge amount of data timely and correctly are needed. Of the many social networking platforms, Twitter is of the most used ones by netizens. It allows its users to communicate, share their opinions, and express their emotions (sentiments) in the form of short blogs easily at no cost. Moreover, unlike other social networking platforms, Twitter allows research institutions to access its public and historical data, upon request and under control. Therefore, many organizations, at different levels (e.g., governmental, commercial), are seeking to benefit from the analysis and classification of the shared tweets to serve in many application domains, for examples, sentiment analysis to evaluate and determine user’s polarity from the content of their shared text, and misleading information detection to ensure the legitimacy and the credibility of the shared information. To attain this objective, one can apply numerous data representation, preprocessing, natural language processing techniques, and machine/deep learning algorithms. There are several challenges and limitations with existing approaches, including issues with the management of tweets in multiple languages, the determination of what features the feature vector should include, and the assignment of representative and descriptive weights to these features for different mining tasks. Besides, there are limitations in existing performance evaluation metrics to fully assess the developed classification systems. In this dissertation, two novel frameworks are introduced; the first is to efficiently analyze and classify bilingual (Arabic and English) textual content of social networks, while the second is for evaluating the performance of binary classification algorithms. The first framework is designed with: (1) An approach to handle Arabic and English written tweets, and can be extended to cover data written in more languages and from other social networking platforms, (2) An effective data preparation and preprocessing techniques, (3) A novel feature selection technique that allows utilizing different types of features (content-dependent, context-dependent, and domain-dependent), in addition to (4) A novel feature extraction technique to assign weights to the linguistic features based on how representative they are in in the classes they belong to. The proposed framework is employed in performing sentiment analysis and misleading information detection. The performance of this framework is compared to state-of-the-art classification approaches utilizing 11 benchmark datasets comprising both Arabic and English textual content, demonstrating considerable improvement over all other performance evaluation metrics. Then, this framework is utilized in a real-life case study to detect misleading information surrounding the spread of COVID-19. In the second framework, a new multidimensional classification assessment score (MCAS) is introduced. MCAS can determine how good the classification algorithm is when dealing with binary classification problems. It takes into consideration the effect of misclassification errors on the probability of correct detection of instances from both classes. Moreover, it should be valid regardless of the size of the dataset and whether the dataset has a balanced or unbalanced distribution of its instances over the classes. An empirical and practical analysis is conducted on both synthetic and real-life datasets to compare the comportment of the proposed metric against those commonly used. The analysis reveals that the new measure can distinguish the performance of different classification techniques. Furthermore, it allows performing a class-based assessment of classification algorithms, to assess the ability of the classification algorithm when dealing with data from each class separately. This is useful if one of the classifying instances from one class is more important than instances from the other class, such as in COVID-19 testing where the detection of positive patients is much more important than negative ones. ; Graduate
|
|
Keyword:
Coronavirus; COVID-19; Data Analysis; Data Mining; Fake News; Fake News Detection; Infodemic; Machine Learning; Misleading Information Detection; SARS-CoV-2; Sentiment Analysis; Social Media Mining; Social Network Analysis; Text Classification; Text Mining; Web Mining
|
|
URL: http://hdl.handle.net/1828/13219
|
|
BASE
|
|
Hide details
|
|
12 |
Application-Oriented Approach for Detecting Cyberaggression in Social Media
|
|
|
|
In: International Conference on Applied Human Factors and Ergonomics ; https://hal.archives-ouvertes.fr/hal-02903422 ; International Conference on Applied Human Factors and Ergonomics, Jul 2020, San Diego, United States. pp.129-136, ⟨10.1007/978-3-030-51328-3_19⟩ ; https://link.springer.com/chapter/10.1007%2F978-3-030-51328-3_19 (2020)
|
|
BASE
|
|
Show details
|
|
13 |
Affective behavior modeling on social networks ; Modélisation des sentiments sur les réseaux sociaux
|
|
|
|
In: https://tel.archives-ouvertes.fr/tel-03339755 ; Social and Information Networks [cs.SI]. Université Montpellier, 2020. English. ⟨NNT : 2020MONTS073⟩ (2020)
|
|
BASE
|
|
Show details
|
|
14 |
Impact Analysis of Document Digitization on Event Extraction
|
|
|
|
In: CEUR Workshop Proceedings ; 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2020) ; https://hal.archives-ouvertes.fr/hal-03026148 ; 4th Workshop on Natural Language for Artificial Intelligence (NL4AI 2020) co-located with the 19th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2020), Nov 2020, Virtual, Italy. pp.17-28 ; http://sag.art.uniroma2.it/NL4AI/ (2020)
|
|
BASE
|
|
Show details
|
|
15 |
Detecting deviations from activities of daily living routines using kinect depth maps and power consumption data
|
|
|
|
In: Research outputs 2014 to 2021 (2020)
|
|
BASE
|
|
Show details
|
|
18 |
Sequence Covering for Efficient Host-Based Intrusion Detection
|
|
|
|
In: ISSN: 1556-6013 ; IEEE Transactions on Information Forensics and Security ; https://hal.archives-ouvertes.fr/hal-01653650 ; IEEE Transactions on Information Forensics and Security, Institute of Electrical and Electronics Engineers, 2019, 14 (4), pp.994-1006. ⟨10.1109/TIFS.2018.2868614⟩ ; https://ieeexplore.ieee.org/document/8454473 (2019)
|
|
BASE
|
|
Show details
|
|
19 |
StoryMiner: An Automated and Scalable Framework for Story Analysis and Detection from Social Media
|
|
|
|
BASE
|
|
Show details
|
|
20 |
A novel framework for biomedical entity sense induction
|
|
|
|
In: ISSN: 1532-0464 ; EISSN: 1532-0480 ; Journal of Biomedical Informatics ; https://hal-lirmm.ccsd.cnrs.fr/lirmm-01851988 ; Journal of Biomedical Informatics, Elsevier, 2018, 84, pp.31-41. ⟨10.1016/j.jbi.2018.06.007⟩ (2018)
|
|
BASE
|
|
Show details
|
|
|
|