论文标题

研究认知偏见在评估对话剂中的影响

Studying the Effects of Cognitive Biases in Evaluation of Conversational Agents

论文作者

Santhanam, Sashank, Karduni, Alireza, Shaikh, Samira

论文摘要

人类经常与对话剂互动。通过神经网络建模的生成语言建模的快速发展有助于促进智能对话剂的创造。研究人员通常通过众包判断来评估其模型的产出,但没有既定的最佳实践来进行此类研究。此外,尚不清楚决策中的认知偏见是否会影响众包工人执行这些任务时的判断。为了调查,我们对77名众包工人进行了一项受试者间研究,以了解认知偏见的作用,特别是锚定偏见,当时要求人类评估对话剂的输出。我们的结果提供了有关如何最好地评估对话剂的见解。我们发现在两个实验条件下的评分一致性可能是锚定偏差的结果。我们还确定外部因素(例如时间和类似任务的先前经验)对评估者间一致性有影响。

Humans quite frequently interact with conversational agents. The rapid advancement in generative language modeling through neural networks has helped advance the creation of intelligent conversational agents. Researchers typically evaluate the output of their models through crowdsourced judgments, but there are no established best practices for conducting such studies. Moreover, it is unclear if cognitive biases in decision-making are affecting crowdsourced workers' judgments when they undertake these tasks. To investigate, we conducted a between-subjects study with 77 crowdsourced workers to understand the role of cognitive biases, specifically anchoring bias, when humans are asked to evaluate the output of conversational agents. Our results provide insight into how best to evaluate conversational agents. We find increased consistency in ratings across two experimental conditions may be a result of anchoring bias. We also determine that external factors such as time and prior experience in similar tasks have effects on inter-rater consistency.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源