Home Catalogue search

eng

Refine your search:

Search in the Catalogues and Directories






	Sort by
Simple Search

Hits 1 – 13 of 13

1	Retweet communities reveal the main sources of hate speech
	Evkoski, Bojan; Pelicon, Andraž; Mozetič, Igor...
	In: PLoS One (2022)
	BASE
	Show details

2	Slovenian Twitter dataset 2018-2020 1.0
	Evkoski, Bojan; Pelicon, Andraž; Mozetič, Igor. - : Jožef Stefan Institute, 2021
	BASE
	Show details

3	Italian YouTube Hate Speech Corpus
	Cinelli, Matteo; Pelicon, Andraž; Mozetič, Igor. - : Jožef Stefan Institute, 2021
	BASE
	Show details

4	Slovenian Twitter hate speech dataset IMSyPP-sl
	Kralj Novak, Petra; Mozetič, Igor; Ljubešić, Nikola. - : Jožef Stefan Institute, 2021
	Abstract: A hand-labeled training (50,000 tweets labeled twice) and evaluation set (10,000 tweets labeled twice) for hate speech on Slovenian Twitter. The data files contain tweet IDs, hate speech type, hate speech target, and annotator ID. For obtaining the full text of the dataset, please contact the first author. Hate speech type: 1. Appropriate - has no target 2. Inappropriate (contains terms that are obscene, vulgar; but the text is not directed at any person specifically) - has no target 3. Offensive (including offensive generalization, contempt, dehumanization, indirect offensive remarks) 4. Violent (author threatens, indulges, desires, or calls for physical violence against a target; it also includes calling for, denying, or glorifying war crimes and crimes against humanity) Hate speech target: 1. Racism (intolerance based on nationality, ethnicity, language, towards foreigners; and based on race, skin color) 2. Migrants (intolerance of refugees or migrants, offensive generalization, call for their exclusion, restriction of rights, non-acceptance, denial of assistance…) 3. Islamophobia (intolerance towards Muslims) 4. Antisemitism (intolerance of Jews; also includes conspiracy theories, Holocaust denial or glorification, offensive stereotypes…) 5. Religion (other than above) 6. Homophobia (intolerance based on sexual orientation and / or identity, calls for restrictions on the rights of LGBTQ persons 7. Sexism (offensive gender-based generalization, misogynistic insults, unjustified gender discrimination) 8. Ideology (intolerance based on political affiliation, political belief, ideology… e.g. “communists”, “leftists”, “home defenders”, “socialists”, “activists for…”) 9. Media (journalists and media, also includes allegations of unprofessional reporting, false news, bias) 10. Politics (intolerance towards individual politicians, authorities, system, political parties) 11. Individual (intolerance toward any other individual due to individual characteristics; like commentator, neighbor, acquaintance ) 12. Other (intolerance towards members of other groups due to belonging to this group; write in the blank column on the right which group it is) Training dataset The training set is sampled from data collected between December 2017 and February 2020. The sampling was intentionally biased to contain as much hate speech as possible. A simple model was used to flag potential hate speech content and additionally, filtering by users and by tweet length (number of characters) was applied. 50,000 tweets were selected for annotation. Evaluation dataset The evaluation set is sampled from data collected between February 2020 and August 2020. Contrary to the training set, the evaluation set is an unbiased random sample. Since the evaluation set is from a later period compared to the training set, the possibility of data linkage is minimized. Furthermore, the estimates of model performance made on the evaluation set are realistic, or even pessimistic, since the evaluation set is characterized by a new topic: Covid-19. 10,000 tweets were selected for the evaluation set. Annotation results Each tweet was annotated twice: In 90% of the cases by two different annotators and in 10% of the cases by the same annotator. Special attention was devoted to evening out the overlap between annotators to get agreement estimates on equally sized sets. Ten annotators were engaged for our annotation campaign. They were given annotation guidelines, a training session, and a test on a small set to evaluate their understanding of the task and their commitment before starting the annotation procedure. Annotator agreement in terms of Krippendorff Alpha is around 0.6. Annotation agreement scores are detailed in the accompanying report files for each dataset separately. The annotation process lasted four months, and it required about 1,200 person-hours for the ten annotators to complete the task.
	Keyword: hate speech; inappropriate language; manual annotation; offensive language; Twitter; violent language
	URL: http://hdl.handle.net/11356/1398
	BASE
	Hide details

5	English YouTube Hate Speech Corpus
	Ljubešić, Nikola; Mozetič, Igor; Cinelli, Matteo. - : Jožef Stefan Institute, 2021
	BASE
	Show details

6	Tweets about impact investing
	Kralj Novak, Petra; de Amicis, Luisa; Mozetič, Igor. - : Jožef Stefan Institute, 2018
	BASE
	Show details

7	Brexit stance annotated tweets
	Grčar, Miha; Cherepnalkoski, Darko; Mozetič, Igor. - : Jožef Stefan Institute, 2017
	BASE
	Show details

8	Dataset of European Parliament roll-call votes and Twitter activities MEP 1.0
	Cherepnalkoski, Darko; Karpf, Andreas; Mozetič, Igor. - : Jožef Stefan Institute, 2016
	BASE
	Show details

9	Twitter sentiment for 15 European languages
	Mozetič, Igor; Grčar, Miha; Smailović, Jasmina. - : Jožef Stefan Institute, 2016
	BASE
	Show details

10	Emoji Sentiment Ranking 1.0
	Kralj Novak, Petra; Smailović, Jasmina; Sluban, Borut. - : Jožef Stefan Institute, 2015
	BASE
	Show details

11	Sentiment of Emojis ...
	Novak, Petra Kralj; Smailović, Jasmina; Sluban, Borut. - : arXiv, 2015
	BASE
	Show details

12	Sentiment of Emojis
	Kralj Novak, Petra; Smailović, Jasmina; Sluban, Borut. - : Public Library of Science, 2015
	BASE
	Show details

13	Extraction of Temporal Networks from Term Co-Occurrences in Online Textual Sources
	Popović, Marko; Štefančić, Hrvoje; Sluban, Borut. - : Public Library of Science, 2014
	BASE
	Show details

© 2013 - 2024 Lin|gu|is|tik | Imprint | Privacy Policy | Datenschutzeinstellungen ändern