论文标题

ROMFAC:一种强大的平均场所演员批判性强化学习,以防止对抗国家的扰动

RoMFAC: A robust mean-field actor-critic reinforcement learning against adversarial perturbations on states

论文作者

Zhou, Ziyuan, Liu, Guanjun

论文摘要

多代理的深入强化学习做出取决于代理观察到的系统状态的最佳决策,但是观察结果的任何不确定性都可能误导代理人采取错误的行动。平均场参与者 - 批判性增强学习(MFAC)在多代理领域众所周知,因为它可以有效地处理可伸缩性问题。但是,它对可能会大大降低团队奖励的扰动很敏感。这项工作提出了一个强大的平均野战表演者 - 批判性强化学习(ROMFAC),该学习具有两个创新:1)由培训参与者组成的新客观功能,由\ emph {政策梯度函数}组成,该功能与预期的累积折扣奖励在采样的清洁状态和\ emph {emph {行动损失功能}之间的累积折扣奖励相关。 2)对动作损失的重复正规化,以确保受过训练的演员获得出色的表现。此外,这项工作提出了一种名为“州对抗随机游戏”(SASG)的游戏模型。尽管可能不存在SASG的NASH平衡,但基于SASG,对Romfac中对州的对抗性扰动被证明是可辩护的。实验结果表明,ROMFAC在对抗性扰动的同时保持其在没有扰动的环境中的竞争性能,对对抗性扰动具有强大的态度。

Multi-agent deep reinforcement learning makes optimal decisions dependent on system states observed by agents, but any uncertainty on the observations may mislead agents to take wrong actions. The Mean-Field Actor-Critic reinforcement learning (MFAC) is well-known in the multi-agent field since it can effectively handle a scalability problem. However, it is sensitive to state perturbations that can significantly degrade the team rewards. This work proposes a Robust Mean-field Actor-Critic reinforcement learning (RoMFAC) that has two innovations: 1) a new objective function of training actors, composed of a \emph{policy gradient function} that is related to the expected cumulative discount reward on sampled clean states and an \emph{action loss function} that represents the difference between actions taken on clean and adversarial states; and 2) a repetitive regularization of the action loss, ensuring the trained actors to obtain excellent performance. Furthermore, this work proposes a game model named a State-Adversarial Stochastic Game (SASG). Despite the Nash equilibrium of SASG may not exist, adversarial perturbations to states in the RoMFAC are proven to be defensible based on SASG. Experimental results show that RoMFAC is robust against adversarial perturbations while maintaining its competitive performance in environments without perturbations.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源