可扩展的深度加固学习，用于物理层的路由和频谱访问

论文标题

可扩展的深度加固学习，用于物理层的路由和频谱访问

Scalable Deep Reinforcement Learning for Routing and Spectrum Access in Physical Layer

论文作者

Cui, Wei, Yu, Wei

论文摘要

本文提出了一种新型的可扩展增强学习方法，用于在无线临时网络中同时路由和频谱访问。在大多数有关网络优化的增强学习的工作中，假定网络拓扑是固定的，并且为每个传输节点训练了不同的代理 - 这限制了可扩展性和可推广性。此外，路由和频谱访问通常被视为单独的任务。此外，优化目标通常是沿路线的累积度量，例如啤酒花或延迟数。在本文中，我们说明了无线网络中物理层信号与噪声比率（SINR），并进一步表明，诸如沿路线的最小SINR之类的瓶颈目标也可以通过增强学习有效地优化。具体而言，我们提出了一种可扩展的方法，其中单个代理与每个流量相关联，并在沿边界节点移动时做出路由和频谱访问决策。根据蒙特卡洛对未来瓶颈SINR的估计，根据环境的物理层特征对代理进行训练。它学会了通过根据相邻节点的地理位置信息智能做出联合路由和频谱分配决策来避免干扰的。

This paper proposes a novel scalable reinforcement learning approach for simultaneous routing and spectrum access in wireless ad-hoc networks. In most previous works on reinforcement learning for network optimization, the network topology is assumed to be fixed, and a different agent is trained for each transmission node -- this limits scalability and generalizability. Further, routing and spectrum access are typically treated as separate tasks. Moreover, the optimization objective is usually a cumulative metric along the route, e.g., number of hops or delay. In this paper, we account for the physical-layer signal-to-interference-plus-noise ratio (SINR) in a wireless network and further show that bottleneck objective such as the minimum SINR along the route can also be optimized effectively using reinforcement learning. Specifically, we propose a scalable approach in which a single agent is associated with each flow and makes routing and spectrum access decisions as it moves along the frontier nodes. The agent is trained according to the physical-layer characteristics of the environment using a novel rewarding scheme based on the Monte Carlo estimation of the future bottleneck SINR. It learns to avoid interference by intelligently making joint routing and spectrum allocation decisions based on the geographical location information of the neighbouring nodes.

下载PDF全文

下载文献需遵守相关版权规定

论文标题