多代理深钢筋学习具有不完美渠道的异质无线网络的多次访问

论文标题

多代理深钢筋学习具有不完美渠道的异质无线网络的多次访问

Multi-Agent Deep Reinforcement Learning Multiple Access for Heterogeneous Wireless Networks with Imperfect Channels

论文作者

Yu, Yiding, Liew, Soung Chang, Wang, Taotao

论文摘要

本文研究了具有不完美渠道的异质无线网络的未来派频谱共享范式。在异质网络中，多个无线网络采用不同的中等访问控制（MAC）协议来共享一个常见的无线频谱，并且每个网络都不知道其他人的Mac。本文旨在为特定网络设计基于分布式的深入增强学习（DRL）MAC协议，该网络的目的是实现全球$α$ fairness的目标。在常规的DRL框架中，始终正确收到给代理商的反馈/奖励，以便代理可以根据收到的奖励优化其策略。在通道嘈杂的无线应用中，由于通道噪声和干扰，反馈/奖励（即ACK数据包）可能会丢失。没有正确的反馈，代理（即网络用户）可能找不到一个好的解决方案。此外，在分布式协议中，每个代理自己做出决定。确保多个代理人将做出连贯的决策并共同实现相同目标的挑战，尤其是面对不完美的反馈渠道。为了应对挑战，我们提出了（i）一种反馈恢复机制，以恢复缺失的反馈信息，以及（ii）两阶段的行动选择机制，以帮助相干决策，以减少代理商之间的传输碰撞。广泛的仿真结果证明了这两种机制的有效性。最后但并非最不重要的一点是，我们认为反馈恢复机制和两阶段的动作选择机制也可以在一般分布的多代理增强学习问题中使用，其中可以损坏有关奖励的反馈信息。

This paper investigates a futuristic spectrum sharing paradigm for heterogeneous wireless networks with imperfect channels. In the heterogeneous networks, multiple wireless networks adopt different medium access control (MAC) protocols to share a common wireless spectrum and each network is unaware of the MACs of others. This paper aims to design a distributed deep reinforcement learning (DRL) based MAC protocol for a particular network, and the objective of this network is to achieve a global $α$-fairness objective. In the conventional DRL framework, feedback/reward given to the agent is always correctly received, so that the agent can optimize its strategy based on the received reward. In our wireless application where the channels are noisy, the feedback/reward (i.e., the ACK packet) may be lost due to channel noise and interference. Without correct feedback, the agent (i.e., the network user) may fail to find a good solution. Moreover, in the distributed protocol, each agent makes decisions on its own. It is a challenge to guarantee that the multiple agents will make coherent decisions and work together to achieve the same objective, particularly in the face of imperfect feedback channels. To tackle the challenge, we put forth (i) a feedback recovery mechanism to recover missing feedback information, and (ii) a two-stage action selection mechanism to aid coherent decision making to reduce transmission collisions among the agents. Extensive simulation results demonstrate the effectiveness of these two mechanisms. Last but not least, we believe that the feedback recovery mechanism and the two-stage action selection mechanism can also be used in general distributed multi-agent reinforcement learning problems in which feedback information on rewards can be corrupted.

下载PDF全文

下载文献需遵守相关版权规定

论文标题