Catalogue search • Linguistik portal • Fachinformationsdienst (FID)

1	Reproducible Subjective Evaluation ...
	Morrison, Max; Tang, Brian; Tan, Gefei; Pardo, Bryan. - : arXiv, 2022
	Abstract: Human perceptual studies are the gold standard for the evaluation of many research tasks in machine learning, linguistics, and psychology. However, these studies require significant time and cost to perform. As a result, many researchers use objective measures that can correlate poorly with human evaluation. When subjective evaluations are performed, they are often not reported with sufficient detail to ensure reproducibility. We propose Reproducible Subjective Evaluation (ReSEval), an open-source framework for quickly deploying crowdsourced subjective evaluations directly from Python. ReSEval lets researchers launch A/B, ABX, Mean Opinion Score (MOS) and MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) tests on audio, image, text, or video data from a command-line interface or using one line of Python, making it as easy to run as objective evaluation. With ReSEval, researchers can reproduce each other's subjective evaluations by sharing a configuration file and the audio, image, text, or video ... : Submitted to ICLR 2022 Workshop on Setting up ML Evaluation Standards to Accelerate Progress ...
	Keyword: FOS Computer and information sciences; Human-Computer Interaction cs.HC; Machine Learning cs.LG
	URL: https://arxiv.org/abs/2203.04444 https://dx.doi.org/10.48550/arxiv.2203.04444
	BASE
	Hide details

Search in the Catalogues and Directories