Home Catalogue search

eng

Refine your search:
- Keyword:
- Creator / Publisher
- Year:
- Medium
- Type
- BLLDB-Access:
  - free (10)
  - subject to license (0)

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 10 of 10

1	Handling Cross- and Out-of-Domain Samples in Thai Word Segmentation ...
	The Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing 2021; Chuangsuwanich, Ekapol; Limkonchotiwat, Peerat. - : Underline Science Inc., 2021
	BASE
	Show details

2	Robust Fragment-Based Framework for Cross-lingual Sentence Retrieval ...
	The 2021 Conference on Empirical Methods in Natural Language Processing 2021; Chuangsuwanich, Ekapol; Limkonchotiwat, Peerat. - : Underline Science Inc., 2021
	BASE
	Show details

3	Sentiment analysis for Urdu online reviews using deep learning models
	Safder, Iqra; Mehmood, Zainab; Sarwar, Raheem...
	In: 38 ; 8 ; 1 (2021)
	BASE
	Show details

4	Handling cross and out-of-domain samples in Thai word segmentation
	Sarwar, Raheem; Phatthiyaphaibun, Wannaphong; Nutanong, Sarana...
	In: 1003 ; 1016 (2021)
	BASE
	Show details

5	Linguistic features evaluation for hadith authenticity through automatic machine learning
	Mohamed, Emad; Sarwar, Raheem. - : Oxford University Press, 2021
	BASE
	Show details

6	Robust fragment-based framework for cross-lingual sentence retrieval
	Nutanong, Sarana; Sarwar, Raheem; Phatthiyaphaibun, Wannaphong...
	In: Findings of the Association for Computational Linguistics: EMNLP 2021 ; 935 ; 944 (2021)
	BASE
	Show details

7	Exploiting Tweet Sentiments in Altmetrics Large-Scale Data ...
	Hassan, Saeed-Ul; Aljohani, Naif Radi; Tarar, Usman Iqbal. - : arXiv, 2020
	BASE
	Show details

8	Domain adaptation of Thai word segmentation models using stacked ensemble
	Limkonchotiwat, Peerat; Chuangsuwanich, Ekapol; Phatthiyaphaibun, Wannaphong...
	In: 3841 ; 3847 (2020)
	BASE
	Show details

9	Native language identification of fluent and advanced non-native writers
	Sarwar, Raheem; Rutherford, Attapol T; Hassan, Saeed-Ul; Rakthanmanon, Thanawin; Nutanong, Sarana
	In: 19 ; 4 ; 1 (2020)
	Abstract: This is an accepted manuscript of an article published by ACM in ACM Transactions on Asian and Low-Resource Language Information Processing in April 2020, available online: https://doi.org/10.1145/3383202 The accepted version of the publication may differ from the final published version. ; Native Language Identification (NLI) aims at identifying the native languages of authors by analyzing their text samples written in a non-native language. Most existing studies investigate this task for educational applications such as second language acquisition and require the learner corpora. This article performs NLI in a challenging context of the user-generated-content (UGC) where authors are fluent and advanced non-native speakers of a second language. Existing NLI studies with UGC (i) rely on the content-specific/social-network features and may not be generalizable to other domains and datasets, (ii) are unable to capture the variations of the language-usage-patterns within a text sample, and (iii) are not associated with any outlier handling mechanism. Moreover, since there is a sizable number of people who have acquired non-English second languages due to the economic and immigration policies, there is a need to gauge the applicability of NLI with UGC to other languages. Unlike existing solutions, we define a topic-independent feature space, which makes our solution generalizable to other domains and datasets. Based on our feature space, we present a solution that mitigates the effect of outliers in the data and helps capture the variations of the language-usage-patterns within a text sample. Specifically, we represent each text sample as a point set and identify the top-k stylistically similar text samples (SSTs) from the corpus. We then apply the probabilistic k nearest neighbors’ classifier on the identified top-k SSTs to predict the native languages of the authors. To conduct experiments, we create three new corpora where each corpus is written in a different language, namely, English, French, and German. Our experimental studies show that our solution outperforms competitive methods and reports more than 80% accuracy across languages. ; Research funded by Higher Education Commission, and Grants for Development of New Faculty Staff at Chulalongkorn University \| Digital Economy Promotion Agency (# MP-62-0003) \| Thailand Research Funds (MRG6180266 and MRG6280175). ; Published version
	Keyword: author profiling; forensic investigation; native language identification; Stylometry; text classification
	URL: https://doi.org/10.1145/3383202 http://hdl.handle.net/2436/623710
	BASE
	Hide details

10	A scalable framework for stylometric analysis query processing
	Nutanong, Sarana; Yu, Chenyun; Sarwar, Raheem. - : IEEE, 2017
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern