DE eng

Search in the Catalogues and Directories

Hits 1 – 1 of 1

1
Detection of Translator Stylometry using Pair-wise Comparative Classification and Network Motif Mining
El-Fiqi, Heba, Engineering & Information Technology, UNSW Canberra, UNSW. - : University of New South Wales - UNSW Canberra. Engineering & Information Technology, 2013
Abstract: Stylometry is the study of the unique linguistic styles and writing behaviours of individuals. The identification of translator stylometry has many contributions in fields such as intellectual-property, education, and forensic linguistics. Despite the research proliferation on the wider research field of authorship attribution using computational linguistics techniques, the translator stylometry problem is more challenging and there is no sufficient machine learning literature on the topic. Some authors even claimed that detecting who translated a piece of text is a problem with no solution; a claim we will challenge in this thesis.In this thesis, we evaluated the use of existing lexical measures for the translator stylometry problem. It was found that vocabulary richness could not identify translator stylometry. This encouraged us to look for non-traditional representations to discover new features to unfold translator stylometry. Network motifs are small sub-graphs that aim at capturing the local structure of a real network. We designed an approach that transforms the text into a network then identifies the distinctive patterns of a translator by employing network motif mining.During our investigations, we redefined the problem of translator stylometry identification as a new type of classification problems that we call Comparative Classification Problem (CCP). In the pair-wise CCP (PWCCP), data are collected on two subjects. The classification problem is to decide given a piece of evidence, which of the two subjects is responsible for it. The key difference between PWCCP and traditional binary problems is that hidden patterns can only be unmasked by comparing the instances as pairs. A modified C4.5 decision tree classifier, we call PWC4.5, is then proposed for PWCCP.A comparison between the two cases of detecting the translator using traditional classification and PWCCP demonstrated a remarkable ability for PWCCP to discriminate between translators.The contributions of the thesis are: (1) providing an empirical study to evaluate the use of stylistic based features for the problem of translator stylometry identification; (2) introducing network motif mining as an effective approach to detect translator stylometry; (3) proposing a modified C4.5 methodology for pair-wise comparative classification.
Keyword: C4.5; Classification Algorithms; Comparative Classification Problems; Computational Linguistics; Decision Trees; Machine learning; Network Motifs; Paired Classification; Parallel Translations; Pattern Recognition; PWC4.5; Social Network Analysis; Stylometry Analysis; Translator Stylometry Identification
URL: http://handle.unsw.edu.au/1959.4/53020
https://unsworks.unsw.edu.au/fapi/datastream/unsworks:11698/SOURCE01?view=true
BASE
Hide details

Catalogues
0
0
0
0
0
0
0
Bibliographies
0
0
0
0
0
0
0
0
0
Linked Open Data catalogues
0
Online resources
0
0
0
0
Open access documents
1
0
0
0
0
© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern