论文标题
无线驱动反向扫描中继通信的优化驱动的层次学习框架
Optimization-driven Hierarchical Learning Framework for Wireless Powered Backscatter-aided Relay Communications
论文作者
论文摘要
在本文中,我们采用了多个无线供电的继电器,以帮助从多安德滕纳接入点到单个安德滕纳接收器的信息传输。无线继电器可以通过反向散射通信在被动模式下运行,或者通过RF通信的活动模式,具体取决于其通道条件和能量状态。我们旨在通过共同优化接入点的光束成形以及继电器的无线电模式和操作参数来最大化整体吞吐量。由于非凸和组合结构,我们开发了一种新颖的优化驱动的层次深度确定性策略梯度(H-DDPG)方法,以动态适应波束成形和中继策略。优化驱动的H-DDPG算法首先将二进制继电器模式选择分解为外环深Q-NETWORK(DQN)算法,然后通过使用Inter-Olloop DDPG算法优化连续的光束形成和中继参数。其次,为了提高学习效率,我们通过为DNN培训提供了更明显的目标估计来将基于模型的优化整合到DDPG框架中。模拟结果表明,与常规DDPG方法相比,这两种特殊设计确保了更稳定的学习并获得更高的奖励性能,高达近20%。
In this paper, we employ multiple wireless-powered relays to assist information transmission from a multi-antenna access point to a single-antenna receiver. The wireless relays can operate in either the passive mode via backscatter communications or the active mode via RF communications, depending on their channel conditions and energy states. We aim to maximize the overall throughput by jointly optimizing the access point's beamforming and the relays' radio modes and operating parameters. Due to the non-convex and combinatorial structure, we develop a novel optimization-driven hierarchical deep deterministic policy gradient (H-DDPG) approach to adapt the beamforming and relay strategies dynamically. The optimization-driven H-DDPG algorithm firstly decomposes the binary relay mode selection into the outer-loop deep Q-network (DQN) algorithm and then optimizes the continuous beamforming and relaying parameters by using the inner-loop DDPG algorithm. Secondly, to improve the learning efficiency, we integrate the model-based optimization into the DDPG framework by providing a better-informed target estimation for DNN training. Simulation results reveal that these two special designs ensure a more stable learning and achieve a higher reward performance, up to nearly 20%, compared to the conventional DDPG approach.