加强学习，以改善延迟约束异质无线网络的随机访问

论文标题

加强学习，以改善延迟约束异质无线网络的随机访问

Reinforcement Learning for Improved Random Access in Delay-Constrained Heterogeneous Wireless Networks

论文作者

Deng, Lei, Wu, Danzhou, Liu, Zilong, Zhang, Yijin, Han, Yunghsiang S.

论文摘要

在本文中，我们第一次研究了一个随机访问问题，以解决延迟约束的异质无线网络。我们从一个简单的两个设备问题开始，其中两个设备通过常见的不可靠碰撞通道将延迟约束的流量（AP）传递到接入点（AP）。通过假设一个设备（称为设备1）采用Aloha，我们旨在优化另一个设备的随机访问方案（称为设备2）。此问题最吸引人的部分是设备2不知道设备1的信息，而是需要最大化系统及时吞吐量。我们首先提出了马尔可夫决策过程（MDP）公式，以得出基于模型的上限，以量化某些随机访问方案的性能差距。然后，我们利用加固学习（RL）设计一种基于R学习的随机访问方案，称为Tiny Stace R-学习随机访问（TSRA），随后将其扩展以解决一般的多设备问题。我们进行广泛的模拟，以表明所提出的TSRA同时达到了及时的及时吞吐量，较低的计算复杂性和较低的功耗（与现有基线 - 深度增强学习多重访问（DLMA））。这表明我们提出的TSRA方案是在有限的计算和电池功能的大规模移动设备上有效随机访问的一种有希望的手段。

In this paper, we for the first time investigate the random access problem for a delay-constrained heterogeneous wireless network. We begin with a simple two-device problem where two devices deliver delay-constrained traffic to an access point (AP) via a common unreliable collision channel. By assuming that one device (called Device 1) adopts ALOHA, we aim to optimize the random access scheme of the other device (called Device 2). The most intriguing part of this problem is that Device 2 does not know the information of Device 1 but needs to maximize the system timely throughput. We first propose a Markov Decision Process (MDP) formulation to derive a model-based upper bound so as to quantify the performance gap of certain random access schemes. We then utilize reinforcement learning (RL) to design an R-learning-based random access scheme, called tiny state-space R-learning random access (TSRA), which is subsequently extended for the tackling of the general multi-device problem. We carry out extensive simulations to show that the proposed TSRA simultaneously achieves higher timely throughput, lower computation complexity, and lower power consumption than the existing baseline--deep-reinforcement learning multiple access (DLMA). This indicates that our proposed TSRA scheme is a promising means for efficient random access over massive mobile devices with limited computation and battery capabilities.

下载PDF全文

下载文献需遵守相关版权规定

论文标题