论文标题
深层上下文匪徒,用于编排具有多个RISS的多用户味o系统
Deep Contextual Bandits for Orchestrating Multi-User MISO Systems with Multiple RISs
论文作者
论文摘要
可重新配置智能表面(RISS)的新兴技术通过信息带有信息信号的可编程传播,有可能将无线环境转变为可控系统。源自深度加固学习领域(DRL)的技术最近在最大化RISS授权的多用户通信系统中的总成绩绩效方面获得了知名度。这种方法通常基于马尔可夫决策过程(MDP)。在本文中,我们相反,在多臂匪徒(MAB)设置的范围下研究了总和的设计问题,这是MDP框架的放松。然而,在许多情况下,在RIS文献中通常做出的假设下,MAB公式更适合渠道和系统模型。为此,我们提出了一种更简单的DRL方法,用于在RIS授权的多用户多输入单输出(MISO)系统中精心策划多个元信息,我们从数值上表明,它可以通过基于最新的MDP的方法表现出同样的性能,而计算的要求较差。
The emergent technology of Reconfigurable Intelligent Surfaces (RISs) has the potential to transform wireless environments into controllable systems, through programmable propagation of information-bearing signals. Techniques stemming from the field of Deep Reinforcement Learning (DRL) have recently gained popularity in maximizing the sum-rate performance in multi-user communication systems empowered by RISs. Such approaches are commonly based on Markov Decision Processes (MDPs). In this paper, we instead investigate the sum-rate design problem under the scope of the Multi-Armed Bandits (MAB) setting, which is a relaxation of the MDP framework. Nevertheless, in many cases, the MAB formulation is more appropriate to the channel and system models under the assumptions typically made in the RIS literature. To this end, we propose a simpler DRL approach for orchestrating multiple metasurfaces in RIS-empowered multi-user Multiple-Input Single-Output (MISO) systems, which we numerically show to perform equally well with a state-of-the-art MDP-based approach, while being less demanding computationally.