Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	Classifying Bias in Large Multilingual Corpora via Crowdsourcing and Topic Modeling
	Caljean, Brianna; Calvert, Katherine; Chang, Ashley; Frank, Elliot; Garay Jáuregui, Rosana; Palo, Geoffrey; Rinker, Ryan; Weakly, Gareth; Wolfrey, Nicolette; Zhang, William. - 2018
	Abstract: Our project extends previous algorithmic approaches to finding bias in large text corpora. We used multilingual topic modeling to examine language-specific bias in the English, Spanish, and Russian versions of Wikipedia. In particular, we placed Spanish articles discussing the Cold War on a Russian-English viewpoint spectrum based on similarity in topic distribution. We then crowdsourced human annotations of Spanish Wikipedia articles for comparison to the topic model. Our hypothesis was that human annotators and topic modeling algorithms would provide correlated results for bias. However, that was not the case. Our annotators indicated that humans were more perceptive of sentiment in article text than topic distribution, which suggests that our classifier provides a different perspective on a text’s bias.
	Keyword: Gemstone Team BIASES
	URL: https://doi.org/10.13016/M2R49GC7C http://hdl.handle.net/1903/20668
	BASE
	Hide details

Search in the Catalogues and Directories