Argscichat：关于科学论文的争论性对话的数据集

论文标题

Argscichat：关于科学论文的争论性对话的数据集

ArgSciChat: A Dataset for Argumentative Dialogues on Scientific Papers

论文作者

Ruggeri, Federico, Mesgar, Mohsen, Gurevych, Iryna

论文摘要

由于缺乏对话数据来训练此类代理，对话代理在科学学科（作为专家领域）的应用被研究了。尽管大多数数据收集框架，例如亚马逊机械Turk，通过连接人群工人和任务设计师来促进通用域的数据收集，但这些框架并未针对专家域中的数据收集进行优化。由于时间预算有限，科学家很少出现在这些框架中。因此，我们介绍了一个新颖的框架，以收集科学家作为科学论文的领域专家之间的对话。我们的框架使科学家可以将科学论文作为对话的基础，并参加他们喜欢纸质标题的对话。我们使用我们的框架来收集新颖的论证对话数据集Argscichat。它由41个科学论文中的41个对话中收集的498条消息组成。除了对Argscichat的广泛分析外，我们还评估了数据集上最近的对话代理。实验结果表明，该药物在Argscichat上表现不佳，激发了对论证科学代理的进一步研究。我们发布框架和数据集。

The applications of conversational agents for scientific disciplines (as expert domains) are understudied due to the lack of dialogue data to train such agents. While most data collection frameworks, such as Amazon Mechanical Turk, foster data collection for generic domains by connecting crowd workers and task designers, these frameworks are not much optimized for data collection in expert domains. Scientists are rarely present in these frameworks due to their limited time budget. Therefore, we introduce a novel framework to collect dialogues between scientists as domain experts on scientific papers. Our framework lets scientists present their scientific papers as groundings for dialogues and participate in dialogue they like its paper title. We use our framework to collect a novel argumentative dialogue dataset, ArgSciChat. It consists of 498 messages collected from 41 dialogues on 20 scientific papers. Alongside extensive analysis on ArgSciChat, we evaluate a recent conversational agent on our dataset. Experimental results show that this agent poorly performs on ArgSciChat, motivating further research on argumentative scientific agents. We release our framework and the dataset.

下载PDF全文

下载文献需遵守相关版权规定

论文标题