重新访问的CoAtent网络

论文标题

重新访问的CoAtent网络

Coagent Networks Revisited

论文作者

Zini, Modjtaba Shokrian, Pedramfar, Mohammad, Riemer, Matthew, Moradipari, Ahmadreza, Liu, Miao

论文摘要

COATENT网络正式化了随机代理的任意网络的概念，该网络合作在强化学习环境中采取行动。行动中的共同网络的突出示例包括层次强化学习的方法（HRL），例如使用选项的方法，这些方法试图通过对HRL代理中的多个随机网络进行测序，试图通过在不同级别引入抽象行动来解决勘探利用权衡。我们首先提供了统一的观点，这些观点属于属于CoaTent网络的许多不同示例。我们这样做是通过在COATENT网络中形式化执行规则，这是由小说和直观的执行路径中的执行概念所启用的。通过分层选项 - 批判性架构中的参数共享的激励，我们使用我们的执行路径的概念来重新访问CoaTent网络理论，并实现了策略梯度定理的简短证明，而没有任何假设对参数如何共享的共享。然后，我们将设置和证明概括为包括坐标不同步的场景。与现有文献相比，这种新的观点和定理也导致数学上更准确和表现的算法。最后，通过运行非组织RL实验，我们调查了选项批判性模型的不同概括的性能和特性。

Coagent networks formalize the concept of arbitrary networks of stochastic agents that collaborate to take actions in a reinforcement learning environment. Prominent examples of coagent networks in action include approaches to hierarchical reinforcement learning (HRL), such as those using options, which attempt to address the exploration exploitation trade-off by introducing abstract actions at different levels by sequencing multiple stochastic networks within the HRL agents. We first provide a unifying perspective on the many diverse examples that fall under coagent networks. We do so by formalizing the rules of execution in a coagent network, enabled by the novel and intuitive idea of execution paths in a coagent network. Motivated by parameter sharing in the hierarchical option-critic architecture, we revisit the coagent network theory and achieve a much shorter proof of the policy gradient theorem using our idea of execution paths, without any assumption on how parameters are shared among coagents. We then generalize our setting and proof to include the scenario where coagents act asynchronously. This new perspective and theorem also lead to more mathematically accurate and performant algorithms than those in the existing literature. Lastly, by running nonstationary RL experiments, we survey the performance and properties of different generalizations of option-critic models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题