论文标题
在现实世界序列到序列任务中从人类反馈中学习的脱机加固学习
Offline Reinforcement Learning from Human Feedback in Real-World Sequence-to-Sequence Tasks
论文作者
论文摘要
可以从现实世界中部署的NLP系统中收集大量的交互日志。如何利用这些丰富的信息?在离线增强学习(RL)设置中使用此类相互作用日志是一种有前途的方法。但是,由于NLP任务的性质和生产系统的约束,出现了一系列挑战。我们简要概述了这些挑战,并讨论了可能的解决方案。
Large volumes of interaction logs can be collected from NLP systems that are deployed in the real world. How can this wealth of information be leveraged? Using such interaction logs in an offline reinforcement learning (RL) setting is a promising approach. However, due to the nature of NLP tasks and the constraints of production systems, a series of challenges arise. We present a concise overview of these challenges and discuss possible solutions.