Page: 1 2 3 4 5 6 7 8 9... 35
81 |
Multimodal datasets: misogyny, pornography, and malignant stereotypes ...
|
|
|
|
Abstract:
We have now entered the era of trillion parameter machine learning models trained on billion-sized datasets scraped from the internet. The rise of these gargantuan datasets has given rise to formidable bodies of critical work that has called for caution while generating these large datasets. These address concerns surrounding the dubious curation practices used to generate these datasets, the sordid quality of alt-text data available on the world wide web, the problematic content of the CommonCrawl dataset often used as a source for training large language models, and the entrenched biases in large-scale visio-linguistic models (such as OpenAI's CLIP model) trained on opaque datasets (WebImageText). In the backdrop of these specific calls of caution, we examine the recently released LAION-400M dataset, which is a CLIP-filtered dataset of Image-Alt-text pairs parsed from the Common-Crawl dataset. We found that the dataset contains, troublesome and explicit images and text pairs of rape, pornography, malign ... : 33 pages ...
|
|
Keyword:
Computers and Society cs.CY; FOS Computer and information sciences
|
|
URL: https://dx.doi.org/10.48550/arxiv.2110.01963 https://arxiv.org/abs/2110.01963
|
|
BASE
|
|
Hide details
|
|
83 |
Identifying Causal Influences on Publication Trends and Behavior: A Case Study of the Computational Linguistics Community ...
|
|
|
|
BASE
|
|
Show details
|
|
84 |
Social Analysis of Young Basque Speaking Communities in Twitter ...
|
|
|
|
BASE
|
|
Show details
|
|
85 |
A bifurcation threshold for contact-induced language change ...
|
|
|
|
BASE
|
|
Show details
|
|
86 |
Improving Classifier Training Efficiency for Automatic Cyberbullying Detection with Feature Density ...
|
|
|
|
BASE
|
|
Show details
|
|
87 |
Learning Information Literacy across the Globe. Frankfurt am Main, May 10th 2019 ...
|
|
|
|
BASE
|
|
Show details
|
|
88 |
ФРАЗЕОЛОГИЗМЫ КАК СРЕДСТВО ПОЗНАНИЯ КУЛЬТУРЫ НАРОДОВ ... : ФРАЗЕОЛОГИЗМДЕР ХАЛЫҚ МӘДЕНИЕТІН ТАНУ ҚҰРАЛЫ РЕТІНДЕ ...
|
|
|
|
BASE
|
|
Show details
|
|
89 |
Decision Making For Celebrity Branding: An Opinion Mining Approach Based On Polarity And Sentiment Analysis Using Twitter Consumer-Generated Content (CGC) ...
|
|
|
|
BASE
|
|
Show details
|
|
90 |
Polarity in the Classroom: A Case Study Leveraging Peer Sentiment Toward Scalable Assessment ...
|
|
|
|
BASE
|
|
Show details
|
|
91 |
Machine Translationese: Effects of Algorithmic Bias on Linguistic Complexity in Machine Translation ...
|
|
|
|
BASE
|
|
Show details
|
|
92 |
Understanding and Countering Stereotypes: A Computational Approach to the Stereotype Content Model ...
|
|
|
|
BASE
|
|
Show details
|
|
93 |
The advent and fall of a vocabulary learning bias from communicative efficiency ...
|
|
|
|
BASE
|
|
Show details
|
|
94 |
Quantifying Gender Biases Towards Politicians on Reddit ...
|
|
|
|
BASE
|
|
Show details
|
|
95 |
Using Sociolinguistic Variables to Reveal Changing Attitudes Towards Sexuality and Gender ...
|
|
|
|
BASE
|
|
Show details
|
|
97 |
Lexical Sorting Centrality to Distinguish Spreading Abilities of Nodes in Complex Networks under the Susceptible-Infectious-Recovered (SIR) Model ...
|
|
|
|
BASE
|
|
Show details
|
|
98 |
The brain is a computer is a brain: neuroscience's internal debate and the social significance of the Computational Metaphor ...
|
|
|
|
BASE
|
|
Show details
|
|
99 |
Capturing social media expressions during the COVID-19 pandemic in Argentina and forecasting mental health and emotions ...
|
|
|
|
BASE
|
|
Show details
|
|
Page: 1 2 3 4 5 6 7 8 9... 35
|
|