论文标题

老师强迫恢复文本生成的奖励功能

Teacher Forcing Recovers Reward Functions for Text Generation

论文作者

Hao, Yongchang, Liu, Yuxin, Mou, Lili

论文摘要

强化学习(RL)已被广泛用于文本生成中,以减轻暴露偏见问题或利用非并行数据集。奖励功能在使RL培训成功中起着重要作用。但是,以前的奖励功能通常是特定于任务的稀疏功能,限制了RL的使用。在我们的工作中,我们提出了一种任务不足的方法,该方法直接从受教师强迫训练的模型中得出了逐步的奖励功能。我们还提出了一个简单的修改,以通过我们的诱导奖励功能稳定非并行数据集的RL培训。经验结果表明,我们的方法在几个文本生成任务上优于自我训练和奖励回归方法,从而确认了我们奖励功能的有效性。

Reinforcement learning (RL) has been widely used in text generation to alleviate the exposure bias issue or to utilize non-parallel datasets. The reward function plays an important role in making RL training successful. However, previous reward functions are typically task-specific and sparse, restricting the use of RL. In our work, we propose a task-agnostic approach that derives a step-wise reward function directly from a model trained with teacher forcing. We additionally propose a simple modification to stabilize the RL training on non-parallel datasets with our induced reward function. Empirical results show that our method outperforms self-training and reward regression methods on several text generation tasks, confirming the effectiveness of our reward function.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源