遗憾的是匪徒的土匪

论文标题

遗憾的是匪徒的土匪

Regret of Age-of-Information Bandits

论文作者

Fatale, Santosh, Bhandari, Kavya, Narula, Urvidh, Moharir, Sharayu, Hanawal, Manjesh Kumar

论文摘要

我们考虑一个具有单个源的系统，该系统可以测量/跟踪随时间变化的数量，并定期尝试将这些测量结果报告给监视站。来自源的每个更新都必须在可用的通信渠道之一上安排。每次尝试通信成功的可能性是所使用的通道的函数。此功能是调度程序未知的。感兴趣的指标是信息年龄（AOI），正式定义为自目的地从源头获得最新更新以来所经过的时间。我们将调度问题建模为将通信渠道作为武器的多臂匪徒问题的变体。我们表征了任何政策可实现的AOI遗憾的下限，并表征了UCB，Thompson采样及其变体的表现。我们的分析结果表明，UCB和Thompson抽样是AOI土匪的最佳秩序。此外，我们提出了新的政策，这些政策与UCB和Thompson抽样不同，使用当前的AOI来做出调度决策。通过模拟，我们显示了拟议的AOI Aware政策优于现有的AOI AGNOSTIC政策。

We consider a system with a single source that measures/tracks a time-varying quantity and periodically attempts to report these measurements to a monitoring station. Each update from the source has to be scheduled on one of K available communication channels. The probability of success of each attempted communication is a function of the channel used. This function is unknown to the scheduler. The metric of interest is the Age-of-Information (AoI), formally defined as the time elapsed since the destination received the recent most update from the source. We model our scheduling problem as a variant of the multi-arm bandit problem with communication channels as arms. We characterize a lower bound on the AoI regret achievable by any policy and characterize the performance of UCB, Thompson Sampling, and their variants. Our analytical results show that UCB and Thompson sampling are order-optimal for AoI bandits. In addition, we propose novel policies which, unlike UCB and Thompson Sampling, use the current AoI to make scheduling decisions. Via simulations, we show the proposed AoI-aware policies outperform existing AoI-agnostic policies.

下载PDF全文

下载文献需遵守相关版权规定

论文标题