41 |
On the Use of Character n-grams as the only Intrinsic Evidence of Plagiarism
|
|
|
|
BASE
|
|
Show details
|
|
42 |
Online Hate Speech against Women: Automatic Identification of Misogyny and Sexism on Twitter
|
|
|
|
BASE
|
|
Show details
|
|
43 |
On the use of word embedding for cross language plagiarism detection
|
|
|
|
BASE
|
|
Show details
|
|
44 |
Overview of PAN 2019: Bots and Gender Profiling, Celebrity Profiling, Cross-domain Authorship Attribution and Style Change Detection
|
|
|
|
BASE
|
|
Show details
|
|
46 |
Improving Attitude Words Classification for Opinion Mining using Word Embedding
|
|
|
|
BASE
|
|
Show details
|
|
47 |
Classifier combination approach for question classification for Bengali question answering system
|
|
|
|
BASE
|
|
Show details
|
|
49 |
Paraphrase Plagiarism Identifcation with Character-level Features
|
|
|
|
BASE
|
|
Show details
|
|
50 |
UH-PRHLT at SemEval-2016 Task 3: Combining Lexical and Semantic-based Features for Community Question Answering ...
|
|
|
|
BASE
|
|
Show details
|
|
51 |
A Resource-Light Method for Cross-Lingual Semantic Textual Similarity ...
|
|
|
|
BASE
|
|
Show details
|
|
52 |
A Low Dimensionality Representation for Language Variety Identification
|
|
|
|
BASE
|
|
Show details
|
|
53 |
Overview of PAN 2018. Author identification, author profiling, and author obfuscation
|
|
|
|
BASE
|
|
Show details
|
|
54 |
A resource-light method for cross-lingual semantic textual similarity
|
|
|
|
Abstract:
[EN] Recognizing semantically similar sentences or paragraphs across languages is beneficial for many tasks, ranging from cross-lingual information retrieval and plagiarism detection to machine translation. Recently proposed methods for predicting cross-lingual semantic similarity of short texts, however, make use of tools and resources (e.g., machine translation systems, syntactic parsers or named entity recognition) that for many languages (or language pairs) do not exist. In contrast, we propose an unsupervised and a very resource-light approach for measuring semantic similarity between texts in different languages. To operate in the bilingual (or multilingual) space, we project continuous word vectors (i.e., word embeddings) from one language to the vector space of the other language via the linear translation model. We then align words according to the similarity of their vectors in the bilingual embedding space and investigate different unsupervised measures of semantic similarity exploiting bilingual embeddings and word alignments. Requiring only a limited-size set of word translation pairs between the languages, the proposed approach is applicable to virtually any pair of languages for which there exists a sufficiently large corpus, required to learn monolingual word embeddings. Experimental results on three different datasets for measuring semantic textual similarity show that our simple resource-light approach reaches performance close to that of supervised and resource-intensive methods, displaying stability across different language pairs. Furthermore, we evaluate the proposed method on two extrinsic tasks, namely extraction of parallel sentences from comparable corpora and cross-lingual plagiarism detection, and show that it yields performance comparable to those of complex resource-intensive state-of-the-art models for the respective tasks. (C) 2017 Published by Elsevier B.V. ; Part of the work presented in this article was performed during second author's research visit to the University of Mannheim, supported by Contact Fellowship awarded by the DAAD scholarship program "STIBET Doktoranden". The research of the last author has been carried out in the framework of the SomEMBED project (TIN2015-71147-C2-1-P). Furthermore, this work was partially funded by the Junior-professor funding programme of the Ministry of Science, Research and the Arts of the state of Baden-Wurttemberg (project "Deep semantic models for high-end NLP application"). ; Glavas, G.; Franco-Salvador, M.; Ponzetto, SP.; Rosso, P. (2018). A resource-light method for cross-lingual semantic textual similarity. Knowledge-Based Systems. 143:1-9. https://doi.org/10.1016/j.knosys.2017.11.041 ; S ; 1 ; 9 ; 143
|
|
Keyword:
Cross-lingual Word embeddings; LENGUAJES Y SISTEMAS INFORMATICOS; Plagiarism detection; Semantic textual similarity; Word alignment Parallel sentences alignment
|
|
URL: http://hdl.handle.net/10251/146277 https://doi.org/10.1016/j.knosys.2017.11.041
|
|
BASE
|
|
Hide details
|
|
55 |
A Knowledge-Based Weighted KNN for Detecting Irony in Twitter
|
|
|
|
BASE
|
|
Show details
|
|
56 |
Character N-Grams for Detecting Deceptive Controversial Opinions
|
|
|
|
BASE
|
|
Show details
|
|
57 |
Code Mixed Cross Script Factoid Question Classification - A Deep Learning Approach
|
|
|
|
BASE
|
|
Show details
|
|
58 |
A survey on author profiling, deception, and irony detection for the Arabic language
|
|
|
|
BASE
|
|
Show details
|
|
59 |
Semantically-informed distance and similarity measures for paraphrase plagiarism identification
|
|
|
|
BASE
|
|
Show details
|
|
60 |
A Multilevel Approach to Sentiment Analysis of Figurative Language in Twitter
|
|
|
|
BASE
|
|
Show details
|
|
|
|