生成增强流网络

论文标题

生成增强流网络

Generative Augmented Flow Networks

论文作者

Pan, Ling, Zhang, Dinghuai, Courville, Aaron, Huang, Longbo, Bengio, Yoshua

论文摘要

生成流网络是一个概率框架，代理商在其中学习对象生成的随机策略，因此生成对象的概率与给定的奖励功能成正比。与奖励最大化基于强化学习的方法相比，发现了高质量和多样化的解决方案。但是，Gflownets仅从终端状态的奖励中学习，这可能会限制其适用性。确实，中级奖励在学习中起着关键作用，例如，即使在特别具有挑战性的稀疏奖励任务中，也可以从内在动机提供中间反馈。受到这一点的启发，我们提出了生成增强流动网络（Gaflownets），这是一个新颖的学习框架，将中间奖励纳入Gflownets。我们通过内在动机来指定中间奖励，以解决稀疏奖励环境中的探索问题。 Gaflownets可以以共同的方式利用基于边缘和州的内在奖励来改善勘探。基于有关GridWorld任务的广泛实验，我们在收敛，性能和解决方案的多样性方面证明了Gaflownet的有效性和效率。我们进一步表明，Gaflownet可扩展到更复杂，更大规模的分子生成域，在该结构域可以实现一致且显着的性能改善。

The Generative Flow Network is a probabilistic framework where an agent learns a stochastic policy for object generation, such that the probability of generating an object is proportional to a given reward function. Its effectiveness has been shown in discovering high-quality and diverse solutions, compared to reward-maximizing reinforcement learning-based methods. Nonetheless, GFlowNets only learn from rewards of the terminal states, which can limit its applicability. Indeed, intermediate rewards play a critical role in learning, for example from intrinsic motivation to provide intermediate feedback even in particularly challenging sparse reward tasks. Inspired by this, we propose Generative Augmented Flow Networks (GAFlowNets), a novel learning framework to incorporate intermediate rewards into GFlowNets. We specify intermediate rewards by intrinsic motivation to tackle the exploration problem in sparse reward environments. GAFlowNets can leverage edge-based and state-based intrinsic rewards in a joint way to improve exploration. Based on extensive experiments on the GridWorld task, we demonstrate the effectiveness and efficiency of GAFlowNet in terms of convergence, performance, and diversity of solutions. We further show that GAFlowNet is scalable to a more complex and large-scale molecule generation domain, where it achieves consistent and significant performance improvement.

下载PDF全文

下载文献需遵守相关版权规定

论文标题