在现实世界序列到序列任务中从人类反馈中学习的脱机加固学习

论文标题

在现实世界序列到序列任务中从人类反馈中学习的脱机加固学习

Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks

论文作者

Kreutzer, Julia, Riezler, Stefan, Lawrence, Carolin

论文摘要

可以从现实世界中部署的NLP系统中收集大量的交互日志。如何利用这些丰富的信息？在离线增强学习（RL）设置中使用此类相互作用日志是一种有前途的方法。但是，由于NLP任务的性质和生产系统的约束，出现了一系列挑战。我们简要概述了这些挑战，并讨论了可能的解决方案。

Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged? Using such interaction logs in an offline reinforcement learning (RL) setting is a promising approach. However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview of these challenges and discuss possible solutions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题