老师强迫恢复文本生成的奖励功能

论文标题

老师强迫恢复文本生成的奖励功能

Teacher Forcing Recovers Reward Functions for Text Generation

论文作者

Hao, Yongchang, Liu, Yuxin, Mou, Lili

论文摘要

强化学习（RL）已被广泛用于文本生成中，以减轻暴露偏见问题或利用非并行数据集。奖励功能在使RL培训成功中起着重要作用。但是，以前的奖励功能通常是特定于任务的稀疏功能，限制了RL的使用。在我们的工作中，我们提出了一种任务不足的方法，该方法直接从受教师强迫训练的模型中得出了逐步的奖励功能。我们还提出了一个简单的修改，以通过我们的诱导奖励功能稳定非并行数据集的RL培训。经验结果表明，我们的方法在几个文本生成任务上优于自我训练和奖励回归方法，从而确认了我们奖励功能的有效性。

Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an important role in making RL training successful. However, previous reward functions are typically task-specific and sparse, restricting the use of RL. In our work, we propose a task-agnostic approach that derives a step-wise reward function directly from a model trained with teacher forcing. We additionally propose a simple modification to stabilize the RL training on non-parallel datasets with our induced reward function. Empirical results show that our method outperforms self-training and reward regression methods on several text generation tasks, confirming the effectiveness of our reward function.

下载PDF全文

下载文献需遵守相关版权规定

论文标题