论文标题

C学习:通过递归分类学习实现目标

C-Learning: Learning to Achieve Goals via Recursive Classification

论文作者

Eysenbach, Benjamin, Salakhutdinov, Ruslan, Levine, Sergey

论文摘要

我们研究预测和控制自治药物未来状态分布的问题。这个问题可以看作是对目标条件的增强学习(RL)的重新构架,以学习对未来状态的有条件概率密度功能为中心。我们没有直接估算此密度函数,而是通过训练分类器来预测观察结果是否来自未来,间接估计了该密度函数。通过贝叶斯的规则,我们分类器的预测可以转变为对未来状态的预测。重要的是,我们算法的非政策变体使我们能够预测新政策的未来国家分布,而无需收集新的经验。这种变体使我们能够优化策略未来状态分布的功能,例如达到特定目标状态的密度。尽管从概念上类似于Q学习,但我们的工作奠定了目标条件的RL作为密度估计的原则性基础,为先前工作中使用的目标条件方法提供了理由。该基础提出了有关Q学习的假设,包括最佳目标采样率,我们通过实验确认。此外,我们提出的方法具有先前的目标条件RL方法的竞争力。

We study the problem of predicting and controlling the future state distribution of an autonomous agent. This problem, which can be viewed as a reframing of goal-conditioned reinforcement learning (RL), is centered around learning a conditional probability density function over future states. Instead of directly estimating this density function, we indirectly estimate this density function by training a classifier to predict whether an observation comes from the future. Via Bayes' rule, predictions from our classifier can be transformed into predictions over future states. Importantly, an off-policy variant of our algorithm allows us to predict the future state distribution of a new policy, without collecting new experience. This variant allows us to optimize functionals of a policy's future state distribution, such as the density of reaching a particular goal state. While conceptually similar to Q-learning, our work lays a principled foundation for goal-conditioned RL as density estimation, providing justification for goal-conditioned methods used in prior work. This foundation makes hypotheses about Q-learning, including the optimal goal-sampling ratio, which we confirm experimentally. Moreover, our proposed method is competitive with prior goal-conditioned RL methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源