1 |
A White-Box Sociolinguistic Model for Gender Detection
|
|
|
|
In: Applied Sciences; Volume 12; Issue 5; Pages: 2676 (2022)
|
|
BASE
|
|
Show details
|
|
2 |
Detection of Hate Speech Spreaders using Convolutional Neural Networks
|
|
|
|
BASE
|
|
Show details
|
|
3 |
Detection of Hate Speech Spreaders using convolutional neural networks
|
|
|
|
BASE
|
|
Show details
|
|
4 |
Native language identification of fluent and advanced non-native writers
|
|
|
|
In: 19 ; 4 ; 1 (2020)
|
|
BASE
|
|
Show details
|
|
5 |
Fine-Grained Analysis of Language Varieties and Demographics
|
|
|
|
BASE
|
|
Show details
|
|
7 |
DEVELOPMENT OF A MACHINE LEARNING ALGORITHM TO PREDICT AUTHOR'S AGE FROM TEXT ...
|
|
|
|
BASE
|
|
Show details
|
|
8 |
DEVELOPMENT OF A MACHINE LEARNING ALGORITHM TO PREDICT AUTHOR'S AGE FROM TEXT ...
|
|
|
|
BASE
|
|
Show details
|
|
9 |
Caracterização autoral a partir de textos utilizando redes neurais artificiais ; Author Profiling from texts using artificial neural networks
|
|
Dias, Rafael Felipe Sandroni. - : Biblioteca Digital de Teses e Dissertações da USP, 2019. : Universidade de São Paulo, 2019. : Escola de Artes, Ciências e Humanidades, 2019
|
|
BASE
|
|
Show details
|
|
11 |
Detección de género en Twitter basado en características multimodales
|
|
|
|
BASE
|
|
Show details
|
|
12 |
A survey on author profiling, deception, and irony detection for the Arabic language
|
|
|
|
BASE
|
|
Show details
|
|
13 |
A Low Dimensionality Representation for Language Variety Identification
|
|
|
|
Abstract:
[EN] Language variety identification aims at labelling texts in a native language (e.g. Spanish, Portuguese, English) with its specific variation (e.g. Argentina, Chile, Mexico, Peru, Spain; Brazil, Portugal; UK, US). In this work we propose a low dimensionality representation (LDR) to address this task with five different varieties of Spanish: Argentina, Chile, Mexico, Peru and Spain. We compare our LDR method with common state-of-the-art representations and show an increase in accuracy of ~35%. Furthermore, we compare LDR with two reference distributed representation models. Experimental results show competitive performance while dramatically reducing the dimensionality¿and increasing the big data suitability¿to only 6 features per variety. Additionally, we analyse the behaviour of the employed machine learning algorithms and the most discriminating features. Finally, we employ an alternative dataset to test the robustness of our low dimensionality representation with another set of similar languages. ; The work of the first author was in the framework of ECOPORTUNITY IPT-2012-1220-430000. The work of the last two authors was in the framework of the SomEMBED MINECO TIN2015-71147-C2-1-P research project. This work has been also supported by the SomEMBED TIN2015-71147-C2-1-P MINECO research project and by the Generalitat Valenciana under the grant ALMAPATER (PrometeoII/2014/030). ; Rangel-Pardo, FM.; Franco-Salvador, M.; Rosso, P. (2018). A Low Dimensionality Representation for Language Variety Identification. Lecture Notes in Computer Science. 9624:156-169. https://doi.org/10.1007/978-3-319-75487-1_13 ; S ; 156 ; 169 ; 9624 ; Franco-Salvador, M., Rangel, F., Rosso, P., Taulé, M., Antònia Martít, M.: Language variety identification using distributed representations of words and documents. In: Mothe, J., Savoy, J., Kamps, J., Pinel-Sauvagnat, K., Jones, G.J.F., SanJuan, E., Cappellato, L., Ferro, N. (eds.) CLEF 2015. LNCS, vol. 9283, pp. 28–40. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24027-5_3 ; Goodman, J.: Classes for fast maximum entropy training. In: Proceedings of the Acoustics, Speech, and Signal Processing (ICASSP 2001), vol. 1, pp. 561–564 (2001) ; Gutmann, M.U., Hyvärinen, A.: Noise-contrastive estimation of unnormalized statistical models, with applications to natural image statistics. J. Mach. Learn. Res. 13, 307–361 (2012) ; Hinton, G.E., Mcclelland, J.L., Rumelhart, D.E.: Distributed Representations, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Foundations, vol. 1. MIT Press, Cambridge (1986) ; Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. In: Proceedings of the 31st International Conference on Machine Learning (ICML 2014), vol. 32 (2014) ; Maier, W., Gómez-Rodríguez, C.: Language variety identification in Spanish tweets. In: Workshop on Language Technology for Closely Related Languages and Language Variants (EMNLP 2014), pp. 25–35 (2014) ; Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. In: Proceedings of Workshop at International Conference on Learning Representations (ICLR 2013) (2013) ; Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013) ; Mnih, A., Teh, Y.W.: A fast and simple algorithm for training neural probabilistic language models. In: Proceedings of the 29th International Conference on Machine Learning (ICML 2012), pp. 1751–1758 (2012) ; Sadat, F., Kazemi, F., Farzindar, A.: Automatic identification of Arabic language varieties and dialects in social media. In: 1st International Workshop on Social Media Retrieval and Analysis (SoMeRa 2014) (2014) ; Salton, G., Buckley, C.: Term-weighting approaches in automatic text retrieval. Inf. Process. Manag. 24(5), 513–523 (1988) ; Tan, L., Zampieri, M., Ljubešic, N., Tiedemann, J.: Merging comparable data sources for the discrimination of similar languages: the DSL corpus collection. In: 7th Workshop on Building and Using Comparable Corpora Building Resources for Machine Translation Research (BUCC 2014), pp. 6–10 (2014) ; Zampieri, M., Gebrekidan-Gebre, B.: Automatic identification of language varieties: the case of Portuguese. In: Proceedings of the 11th Conference on Natural Language Processing (KONVENS 2012), pp. 233–237 (2012) ; Zampieri, M., Tan, L., Ljubeši, N., Tiedemann, J.: A report on the DSL shared task 2014. In: Proceedings of the First Workshop on Applying NLP Tools to Similar Languages, Varieties and Dialects (VarDial 2014), pp. 58–67 (2014)
|
|
Keyword:
Author profiling; Big data; Language variety identification; LENGUAJES Y SISTEMAS INFORMATICOS; Low dimensionality representation; Similar languages discrimination; Social media
|
|
URL: http://hdl.handle.net/10251/146184 https://doi.org/10.1007/978-3-319-75487-1_13
|
|
BASE
|
|
Hide details
|
|
14 |
Overview of PAN 2018. Author identification, author profiling, and author obfuscation
|
|
|
|
BASE
|
|
Show details
|
|
15 |
A survey on author profiling, deception, and irony detection for the Arabic language
|
|
|
|
BASE
|
|
Show details
|
|
16 |
"Is There Choice in Non-Native Voice?" Linguistic Feature Engineering and a Variationist Perspective in Automatic Native Language Identification ...
|
|
|
|
BASE
|
|
Show details
|
|
17 |
Feature engineering for author profiling and identification: on the relevance of syntax and discourse
|
|
|
|
In: TDX (Tesis Doctorals en Xarxa) (2017)
|
|
BASE
|
|
Show details
|
|
18 |
"Is There Choice in Non-Native Voice?" Linguistic Feature Engineering and a Variationist Perspective in Automatic Native Language Identification
|
|
|
|
BASE
|
|
Show details
|
|
19 |
Overview of PAN'17: Author Identification, Author Profiling, and Author Obfuscation
|
|
|
|
BASE
|
|
Show details
|
|
20 |
Author Profiling in Social Media: The Impact of Emotions on Discourse Analysis
|
|
|
|
BASE
|
|
Show details
|
|
|
|