上行链路多载体非正交多访问资源分配的深度强化学习使用缓冲状态信息分配

论文标题

上行链路多载体非正交多访问资源分配的深度强化学习使用缓冲状态信息分配

Deep Reinforcement Learning for Uplink Multi-Carrier Non-Orthogonal Multiple Access Resource Allocation Using Buffer State Information

论文作者

Bansbach, Eike-Manuel, Kiyak, Yigit, Schmalen, Laurent

论文摘要

对于正交多访问（OMA）系统，服务的用户设备（UES）的数量仅限于可用的正交资源的数量。另一方面，非正交多访问（NOMA）方案允许多个UES使用相同的正交资源。这种额外的自由为资源分配带来了新的挑战。缓冲状态信息（BSI）（例如等待传输的数据包的大小和年龄）可用于改善OMA系统中的调度。在本文中，我们研究了BSI对上行链路多载波NOMA场景中集中调度程序的性能的影响，UE具有各种数据速率和延迟要求。为了处理将UES分配给资源的庞大组合空间，我们提出了一个基于Actor-Critic-Critic强化学习纳入BSI的新型调度程序。使用诺基亚的“无线套房”进行培训和评估。我们提出了各种新颖的技术来稳定和加快训练。提议的调度程序的表现优于基准调度程序。

For orthogonal multiple access (OMA) systems, the number of served user equipments (UEs) is limited to the number of available orthogonal resources. On the other hand, non-orthogonal multiple access (NOMA) schemes allow multiple UEs to use the same orthogonal resource. This extra degree of freedom introduces new challenges for resource allocation. Buffer state information (BSI), like the size and age of packets waiting for transmission, can be used to improve scheduling in OMA systems. In this paper, we investigate the impact of BSI on the performance of a centralized scheduler in an uplink multi-carrier NOMA scenario with UEs having various data rate and latency requirements. To handle the large combinatorial space of allocating UEs to the resources, we propose a novel scheduler based on actor-critic reinforcement learning incorporating BSI. Training and evaluation are carried out using Nokia's "wireless suite". We propose various novel techniques to both stabilize and speed up training. The proposed scheduler outperforms benchmark schedulers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题