论文标题
我们可以信任在线人群工作者吗?在虚拟代理的偏好测试中比较在线和离线参与者
Can we trust online crowdworkers? Comparing online and offline participants in a preference test of virtual agents
论文作者
论文摘要
进行用户研究是许多科学领域的关键组成部分。虽然一些研究要求参与者在物理上存在,但可以同时进行其他研究(例如,在单位)和在线(例如,通过众包)进行。邀请参与者加入实验室可能是一项耗时且在逻辑上困难的努力,更不用说有时研究小组可能无法进行LAB实验,因为例如,大流行。因此,众包平台(例如亚马逊机械土耳其人(AMT)或多产)可以是运行某些实验的合适替代方法,例如评估虚拟药物。尽管以前的研究调查了众包平台进行运行实验的使用,但结果对于感知研究是否可靠仍然存在不确定性。在这里,我们复制了先前的实验,参与者评估了虚拟药物的手势生成模型。该实验是在三个参与者的池中进行的,即在LAB,多产和AMT中 - 在LAB参与者和多产的平台上具有相似的人口统计信息。我们的结果表明,在对手势生成模型及其可靠性评分的评估方面,这三个参与者之间没有差异。结果表明,在线平台可以成功地用于对此类的感知评估。
Conducting user studies is a crucial component in many scientific fields. While some studies require participants to be physically present, other studies can be conducted both physically (e.g. in-lab) and online (e.g. via crowdsourcing). Inviting participants to the lab can be a time-consuming and logistically difficult endeavor, not to mention that sometimes research groups might not be able to run in-lab experiments, because of, for example, a pandemic. Crowdsourcing platforms such as Amazon Mechanical Turk (AMT) or Prolific can therefore be a suitable alternative to run certain experiments, such as evaluating virtual agents. Although previous studies investigated the use of crowdsourcing platforms for running experiments, there is still uncertainty as to whether the results are reliable for perceptual studies. Here we replicate a previous experiment where participants evaluated a gesture generation model for virtual agents. The experiment is conducted across three participant pools -- in-lab, Prolific, and AMT -- having similar demographics across the in-lab participants and the Prolific platform. Our results show no difference between the three participant pools in regards to their evaluations of the gesture generation models and their reliability scores. The results indicate that online platforms can successfully be used for perceptual evaluations of this kind.