论文标题
可再现的主观评估
Reproducible Subjective Evaluation
论文作者
论文摘要
人类感知研究是评估机器学习,语言学和心理学中许多研究任务的黄金标准。但是,这些研究需要大量的时间和成本才能执行。结果,许多研究人员使用的客观措施与人类评估相关。当进行主观评估时,通常不会有足够的细节来报告它们以确保可重复性。我们提出可再现的主观评估(Reseval),这是一个开源框架,用于直接从Python快速部署众包主观评估。 Reseval允许研究人员启动A/B,ABX,平均意见分数(MOS)和多个刺激,并通过命令行界面或使用Python系列的音频,图像,文本或视频数据进行隐藏的参考和锚测试(Mushra)测试,从而使其易于作为客观评估运行。使用Reseval,研究人员可以通过共享配置文件以及音频,图像,文本或视频文件来重现彼此的主观评估。
Human perceptual studies are the gold standard for the evaluation of many research tasks in machine learning, linguistics, and psychology. However, these studies require significant time and cost to perform. As a result, many researchers use objective measures that can correlate poorly with human evaluation. When subjective evaluations are performed, they are often not reported with sufficient detail to ensure reproducibility. We propose Reproducible Subjective Evaluation (ReSEval), an open-source framework for quickly deploying crowdsourced subjective evaluations directly from Python. ReSEval lets researchers launch A/B, ABX, Mean Opinion Score (MOS) and MUltiple Stimuli with Hidden Reference and Anchor (MUSHRA) tests on audio, image, text, or video data from a command-line interface or using one line of Python, making it as easy to run as objective evaluation. With ReSEval, researchers can reproduce each other's subjective evaluations by sharing a configuration file and the audio, image, text, or video files.