人类关于课程加强学习的决策，难以调整

论文标题

人类关于课程加强学习的决策，难以调整

Human Decision Makings on Curriculum Reinforcement Learning with Difficulty Adjustment

论文作者

Zeng, Yilei, Duan, Jiali, Li, Yang, Ferrara, Emilio, Pinto, Lerrel, Kuo, C. -C. Jay, Nikolaidis, Stefanos

论文摘要

以人为中心的AI考虑了人工智能表现的经验。尽管大量的研究一直在通过全自动或弱监督学习来帮助AI实现超人的表现，但较少的努力正在尝试AI如何量身定制人类对人类首选的技能水平，并且考虑到细粒度的输入。在这项工作中，我们指导课程加强学习结果朝着首选的绩效水平，这既不难以理，也不太容易地从人类的决策过程中学习。为了实现这一目标，我们开发了一个便携式，交互式平台，该平台使用户能够通过操纵任务难度，观察性能并提供课程反馈来在线与代理进行交互。我们的系统高度可行，使人类可以训练大规模加强学习应用程序，这些学习应用需要数百万个没有服务器的样本。结果证明了互动课程对涉及人类循环的增强学习的有效性。它表明增强学习绩效可以成功地与人类所需的难度水平同步调整。我们认为，这项研究将为实现流动和个性化的适应性困难打开新的大门。

Human-centered AI considers human experiences with AI performance. While abundant research has been helping AI achieve superhuman performance either by fully automatic or weak supervision learning, fewer endeavors are experimenting with how AI can tailor to humans' preferred skill level given fine-grained input. In this work, we guide the curriculum reinforcement learning results towards a preferred performance level that is neither too hard nor too easy via learning from the human decision process. To achieve this, we developed a portable, interactive platform that enables the user to interact with agents online via manipulating the task difficulty, observing performance, and providing curriculum feedback. Our system is highly parallelizable, making it possible for a human to train large-scale reinforcement learning applications that require millions of samples without a server. The result demonstrates the effectiveness of an interactive curriculum for reinforcement learning involving human-in-the-loop. It shows reinforcement learning performance can successfully adjust in sync with the human desired difficulty level. We believe this research will open new doors for achieving flow and personalized adaptive difficulties.

下载PDF全文

下载文献需遵守相关版权规定

论文标题