论文标题
地牢和龙作为人工智能的对话挑战
Dungeons and Dragons as a Dialog Challenge for Artificial Intelligence
论文作者
论文摘要
AI研究人员将地牢和龙(D&D)视为对各种语言相关功能进行测试系统的挑战问题。在本文中,我们将D&D专门作为对话系统挑战,在该挑战中,任务既要在游戏中产生下一个对话转弯并预测对话历史记录的游戏状态。我们创建了一个游戏玩法数据集,该数据集由近900场比赛组成,共有7,000名玩家,800,000个对话,500,000个骰子掷骰和5800万字。我们会使用有关游戏玩法的部分状态信息自动注释数据。我们训练大型语言模型(LM)生成下一个游戏转弯,并根据不同的信息进行调节。 LM可以作为特定角色做出响应,也可以作为运行游戏的玩家 - 即地牢大师(DM)。经过培训,可以进行对话,该对话是特征者(虚构的世界中的角色扮演)或脱节者(讨论规则或策略)。我们进行人类评估,以确定哪些因素使生成的产出合理且有趣。我们进一步执行自动评估,以确定鉴于历史记录的模型能够预测游戏状态的能力,并检查跟踪游戏状态如何提高其产生合理的对话输出的能力。
AI researchers have posited Dungeons and Dragons (D&D) as a challenge problem to test systems on various language-related capabilities. In this paper, we frame D&D specifically as a dialogue system challenge, where the tasks are to both generate the next conversational turn in the game and predict the state of the game given the dialogue history. We create a gameplay dataset consisting of nearly 900 games, with a total of 7,000 players, 800,000 dialogue turns, 500,000 dice rolls, and 58 million words. We automatically annotate the data with partial state information about the game play. We train a large language model (LM) to generate the next game turn, conditioning it on different information. The LM can respond as a particular character or as the player who runs the game--i.e., the Dungeon Master (DM). It is trained to produce dialogue that is either in-character (roleplaying in the fictional world) or out-of-character (discussing rules or strategy). We perform a human evaluation to determine what factors make the generated output plausible and interesting. We further perform an automatic evaluation to determine how well the model can predict the game state given the history and examine how well tracking the game state improves its ability to produce plausible conversational output.