深度回声状态Q-network（DEQN）及其在5G及以后的动态频谱共享中的应用

论文标题

深度回声状态Q-network（DEQN）及其在5G及以后的动态频谱共享中的应用

Deep Echo State Q-Network (DEQN) and Its Application in Dynamic Spectrum Sharing for 5G and Beyond

论文作者

Chang, Hao-Hsuan, Liu, Lingjia, Yi, Yang

论文摘要

深度加固学习（DRL）已被证明在许多应用领域都取得了成功。通过捕获时间信息，将复发性神经网络（RNN）和DRL相结合，使DRL适用于非马克维亚环境。但是，众所周知，对DRL和RNN的培训都是具有挑战性的，需要大量的培训数据才能获得收敛。在许多有针对性的应用中，例如第五代（5G）蜂窝通信中使用的应用程序，环境是高度动态的，而可用的训练数据非常有限。因此，制定能够捕获需要有限训练开销的动态环境的时间相关性的DRL策略非常重要。在本文中，我们介绍了深度回声状态Q-network（DEQN），可以在短时间内使用有限的培训数据来适应高度动态的环境。我们评估了在动态频谱共享（DSS）方案下引入的DEQN方法的性能，这是5G和未来6G网络中有前途的技术，可提高频谱利用率。与传统的频谱管理策略相比，将固定频谱带授予单个系统以进行独家访问，DSS允许二级系统与主系统共享频谱。我们的工作阐明了在高度动态的环境中应用有效的DRL框架，其可用培训数据有限。

Deep reinforcement learning (DRL) has been shown to be successful in many application domains. Combining recurrent neural networks (RNNs) and DRL further enables DRL to be applicable in non-Markovian environments by capturing temporal information. However, training of both DRL and RNNs is known to be challenging requiring a large amount of training data to achieve convergence. In many targeted applications, such as those used in the fifth generation (5G) cellular communication, the environment is highly dynamic while the available training data is very limited. Therefore, it is extremely important to develop DRL strategies that are capable of capturing the temporal correlation of the dynamic environment requiring limited training overhead. In this paper, we introduce the deep echo state Q-network (DEQN) that can adapt to the highly dynamic environment in a short period of time with limited training data. We evaluate the performance of the introduced DEQN method under the dynamic spectrum sharing (DSS) scenario, which is a promising technology in 5G and future 6G networks to increase the spectrum utilization. Compared to conventional spectrum management policy that grants a fixed spectrum band to a single system for exclusive access, DSS allows the secondary system to share the spectrum with the primary system. Our work sheds light on the application of an efficient DRL framework in highly dynamic environments with limited available training data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题