Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 11 of 11

1	Literacy and Access to the Opportunities of Democracy
	Duffy, Christopher E
	In: Internship Reflection Papers (2021)
	BASE
	Show details

2	Humor, creativity and lexical creation ; Humour, créativité et création lexicale
	Brisset, Frédérique; BORDET, Lucile. - : HAL CCSD, 2021. : Université Jean-Moulin-Lyon III - Centre d’Études Linguistiques (CEL), 2021
	In: ISSN: 1951-6215 ; EISSN: 1951-6215 ; Lexis. Journal in English Lexicology ; https://hal.archives-ouvertes.fr/hal-02994959 ; Centre d’Études Linguistiques (Linguistics Research Center) University of Lyon. Lexis. Journal in English Lexicology, 2021, Humor, creativity and lexical creation, 1951-6215. ⟨10.4000/lexis.5585⟩ ; https://journals.openedition.org/lexis/3602 (2021)
	BASE
	Show details

3	Automatic Language Identification in Code-Switched Hindi-English Social Media Text
	Nguyen, Li; Kidwai, Sana; Biberauer, Theresa; Bryant, Christopher
	In: Journal of Open Humanities Data; Vol 7 (2021); 7 ; 2059-481X (2021)
	Abstract: Natural Language Processing (NLP) tools typically struggle to process code-switched data and so linguists are commonly forced to annotate such data manually. As this data becomes more readily available, automatic tools are increasingly needed to help speed up the annotation process and improve consistency. Last year, such a toolkit was developed to semi-automatically annotate transcribed bilingual code-switched Vietnamese-English speech data with token-based language information and POS tags (hereafter the CanVEC toolkit, L. Nguyen & Bryant, 2020). In this work, we extend this methodology to another language pair, Hindi-English, to explore the extent to which we can standardise the automation process. Specifically, we applied the principles behind the CanVEC toolkit to data from the International Conference on Natural Language Processing (ICON) 2016 shared task, which consists of social media posts (Facebook, Twitter and WhatsApp) that have been annotated with language and POS tags (Molina et al., 2016). We used the ICON-2016 annotations as the gold-standard labels in the language identification task. Ultimately, our tool achieved an F1 score of 87.99% on the ICON-2016 data. We then evaluated the first 500 tokens of each social media subset manually, and found almost 40% of all errors were caused entirely by problems with the gold-standard, i.e., our system was correct. It is thus likely that the overall accuracy of our system is higher than reported. This shows great potential for effectively automating the annotation of code-switched corpora, on different language combinations, and in different genres. We finally discuss some limitations of our approach and release our code and human evaluation together with this paper.
	Keyword: automatic annotation; code-switching; Computational Linguistics; English; Hindi; language identification; Linguistics; Vietnamese
	URL: https://openhumanitiesdata.metajnl.com/jms/article/view/44 https://doi.org/10.5334/johd.44
	BASE
	Hide details

4	Vowel duration and consonant voicing: A production study ...
	Coretta, Stefano. - : Open Science Framework, 2021
	BASE
	Show details

5	Mental simulation of the illusory and the factual in negation processing ...
	Vanek, Norbert. - : Open Science Framework, 2021
	BASE
	Show details

6	Does the language you speak shape the way you think about the world? ...
	Djalal, Farah. - : Open Science Framework, 2021
	BASE
	Show details

7	Investigating the processing of question types in contexts with different prior beliefs ...
	Macuch Silva, Vinicius. - : Open Science Framework, 2021
	BASE
	Show details

8	Social Factors in the Production, Perception and Processing of Contact Varieties: Evidence from Bilingual Corpora, Nativeness Evaluations, and Real-time Processing (EEG) of Spanish-accented English
	Sabo, Emily. - 2021
	BASE
	Show details

9	A contrastive study of the EFL vowel system in native Spanish, French, German and Russian learners
	Juan Checa, José Javier. - : Universitat Jaume I, 2021
	In: TDX (Tesis Doctorals en Xarxa) (2021)
	BASE
	Show details

10	The French-English Bilingual Mind
	Dulaney, Quinlan Bovee
	In: The Journal of Purdue Undergraduate Research (2021)
	BASE
	Show details

11	Transliteracy Sponsorscapes: Potential for Attunement and Diffraction in Literacy Learning
	Shelton, Holly. - 2021
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern