论文标题
皮革:学习在对话中生成类似人类文本的框架
LEATHER: A Framework for Learning to Generate Human-like Text in Dialogue
论文作者
论文摘要
对话中的文本产生算法可能会被误导。例如,在以任务为导向的设置中,仅优化任务成功的强化学习可能会导致糟糕的词汇多样性。我们假设这是由于对文本生成目标的理论理解不良及其与学习过程的关系(即模型培训)所致。为此,我们提出了一个新的理论框架,用于学习在对话中生成文本。与现有的学习理论相比,我们的框架可以分析文本产生固有的多面目标。我们使用我们的框架来为适应看不见数据的学习者开发理论保证。例如,我们将理论应用于为猜测提出的合作学习算法中的数据迁移吗?视觉对话游戏。从这个见解中,我们提出了一种新的算法,从经验上讲,我们证明了我们的建议改善了生成文本的任务成功和人类的肯定。最后,我们从理论中显示的统计数据在经验上可以预测生成的对话的多种品质,这表明当人类评估无法获得时,我们的理论对于模型选择很有用。
Algorithms for text-generation in dialogue can be misguided. For example, in task-oriented settings, reinforcement learning that optimizes only task-success can lead to abysmal lexical diversity. We hypothesize this is due to poor theoretical understanding of the objectives in text-generation and their relation to the learning process (i.e., model training). To this end, we propose a new theoretical framework for learning to generate text in dialogue. Compared to existing theories of learning, our framework allows for analysis of the multi-faceted goals inherent to text-generation. We use our framework to develop theoretical guarantees for learners that adapt to unseen data. As an example, we apply our theory to study data-shift within a cooperative learning algorithm proposed for the GuessWhat?! visual dialogue game. From this insight, we propose a new algorithm, and empirically, we demonstrate our proposal improves both task-success and human-likeness of the generated text. Finally, we show statistics from our theory are empirically predictive of multiple qualities of the generated dialogue, suggesting our theory is useful for model-selection when human evaluations are not available.