强大的帕累托设置识别，并具有污染的匪徒反馈

论文标题

强大的帕累托设置识别，并具有污染的匪徒反馈

Robust Pareto Set Identification with Contaminated Bandit Feedback

论文作者

Korkmaz, İlter Onat, Ceyani, Efe Eren, Bozgan, Kerem, Tekin, Cem

论文摘要

我们考虑具有污染的奖励观测值的多武器多臂匪徒（MO-MAB）中的帕累托集识别（PSI）问题。在每个手臂拉动的情况下，具有一些固定的概率，将真正的奖励样品替换为从对手选择的任意污染分布中的样品。我们考虑（α，δ）-PAC PSI，并提出了一种基于样品的基于中位数的多目标自适应消除算法，该算法返回（α，δ） - PAC PARETO在终止时设置的PAC PAC，其样品复杂性与污染概率的样品复杂性结合。随着污染概率的降低，我们恢复了众所周知的样品复杂性导致MO-MAB。我们将所提出的算法与Mo-Mab文献的基于平均的方法以及使用中位估计器的扩展版本进行了比较，该算法在对抗性腐败下的几个PSI问题，包括审查爆炸和糖尿病管理。我们的数值结果支持我们的理论发现，并证明了鲁棒算法设计对于在受污染的奖励观察结果下准确的PSI至关重要。

We consider the Pareto set identification (PSI) problem in multi-objective multi-armed bandits (MO-MAB) with contaminated reward observations. At each arm pull, with some fixed probability, the true reward samples are replaced with the samples from an arbitrary contamination distribution chosen by an adversary. We consider (α, δ)-PAC PSI and propose a sample median-based multi-objective adaptive elimination algorithm that returns an (α, δ)- PAC Pareto set upon termination with a sample complexity bound that depends on the contamination probability. As the contamination probability decreases, we recover the wellknown sample complexity results in MO-MAB. We compare the proposed algorithm with a mean-based method from MO-MAB literature, as well as an extended version that uses median estimators, on several PSI problems under adversarial corruptions, including review bombing and diabetes management. Our numerical results support our theoretical findings and demonstrate that robust algorithm design is crucial for accurate PSI under contaminated reward observations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题