非单调性supdodular最大化的在线学习：从完整信息到强盗反馈

论文标题

非单调性supdodular最大化的在线学习：从完整信息到强盗反馈

Online Learning for Non-monotone Submodular Maximization: From Full Information to Bandit Feedback

论文作者

Zhang, Qixin, Deng, Zengde, Chen, Zaiyi, Zhou, Kuangqi, Hu, Haoyuan, Yang, Yu

论文摘要

在本文中，我们在下闭合的凸套装上重新访问了在线非占用DR-submodultial最大化问题，该集合在机器学习，经济学和操作研究的领域中找到了广泛的现实世界应用。首先，我们以$ o（\ sqrt {t}）$的$ 1/e $ regret呈现Meta-Mfw算法，费用为$ t^{3/2} $每回合的随机梯度评估。据我们所知，Meta-MFW是第一个获得$ 1/e $ regret $ o（\ sqrt {t}）$的算法，用于在线非单调连续的DR-DR-submodumarmumimizulibal最大化问题，而不是截止闭合的convex集合。此外，与ODC算法\ citep {thang2021online}形成鲜明对比的是，meta-mfw依赖于简单的在线线性甲骨文而无需离散化，提升或舍入操作。考虑到实用的限制，我们提出了单声道-MFW算法，该算法将每个功能的随机梯度评估从$ t^{3/2} $减少到1，并实现$ 1/e $ -regret bond $ o（t^{4/5}）$。接下来，我们将Mono-MFW扩展到Bandit设置，并提出Bandit-MFW算法，该算法获得了$ 1/e $ -Regret的$ O（T^{8/9}）$。据我们所知，Mono-MFW和Bandit-MFW是第一个分别在下闭封的CONVEX集中，分别在线非占用的DR-Submodular Maximization问题来探索单发和匪徒的第一个Sublinear-Regret算法。最后，我们对合成数据集和现实数据集进行了数值实验，以验证我们方法的有效性。

In this paper, we revisit the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, which finds wide real-world applications in the domain of machine learning, economics, and operations research. At first, we present the Meta-MFW algorithm achieving a $1/e$-regret of $O(\sqrt{T})$ at the cost of $T^{3/2}$ stochastic gradient evaluations per round. As far as we know, Meta-MFW is the first algorithm to obtain $1/e$-regret of $O(\sqrt{T})$ for the online non-monotone continuous DR-submodular maximization problem over a down-closed convex set. Furthermore, in sharp contrast with ODC algorithm \citep{thang2021online}, Meta-MFW relies on the simple online linear oracle without discretization, lifting, or rounding operations. Considering the practical restrictions, we then propose the Mono-MFW algorithm, which reduces the per-function stochastic gradient evaluations from $T^{3/2}$ to 1 and achieves a $1/e$-regret bound of $O(T^{4/5})$. Next, we extend Mono-MFW to the bandit setting and propose the Bandit-MFW algorithm which attains a $1/e$-regret bound of $O(T^{8/9})$. To the best of our knowledge, Mono-MFW and Bandit-MFW are the first sublinear-regret algorithms to explore the one-shot and bandit setting for online non-monotone continuous DR-submodular maximization problem over a down-closed convex set, respectively. Finally, we conduct numerical experiments on both synthetic and real-world datasets to verify the effectiveness of our methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题