论文标题

强化教学

Reinforcement Teaching

论文作者

Muslimani, Calarina, Lewandowski, Alex, Schuurmans, Dale, Taylor, Matthew E., Luo, Jun

论文摘要

机器学习算法学会了解决任务,但无法提高其学习能力。元学习方法了解机器学习算法并改进它们,以便它们更快地学习。但是,现有的元学习方法是手工制作的,以改善算法的一个特定组成部分,或者仅与可区分算法一起使用。我们开发了一个统一的元学习框架,称为增强教学,以改善\ emph {any}算法的学习过程。在加强教学下,通过加强来学习教学政策,以改善学生的学习算法。为了学习有效的教学政策,我们介绍了参数行为嵌入器,该嵌入者从其输入/输出行为中学习了学生可学习的参数的表示。我们进一步使用学习进度来塑造教师的奖励,从而使其更快地最大化学生的表现。为了证明加强教学的一般性,我们进行了实验,其中教师学会了显着改善强化和监督学习算法。强化教学优于先前使用启发式奖励功能和状态表示以及其他参数表示的工作。

Machine learning algorithms learn to solve a task, but are unable to improve their ability to learn. Meta-learning methods learn about machine learning algorithms and improve them so that they learn more quickly. However, existing meta-learning methods are either hand-crafted to improve one specific component of an algorithm or only work with differentiable algorithms. We develop a unifying meta-learning framework, called Reinforcement Teaching, to improve the learning process of \emph{any} algorithm. Under Reinforcement Teaching, a teaching policy is learned, through reinforcement, to improve a student's learning algorithm. To learn an effective teaching policy, we introduce the parametric-behavior embedder that learns a representation of the student's learnable parameters from its input/output behavior. We further use learning progress to shape the teacher's reward, allowing it to more quickly maximize the student's performance. To demonstrate the generality of Reinforcement Teaching, we conduct experiments in which a teacher learns to significantly improve both reinforcement and supervised learning algorithms. Reinforcement Teaching outperforms previous work using heuristic reward functions and state representations, as well as other parameter representations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源