论文标题
在私有在线凸优化中:$ \ ell_p $ - 几何的最佳算法和高维上下文匪徒
On Private Online Convex Optimization: Optimal Algorithms in $\ell_p$-Geometry and High Dimensional Contextual Bandits
论文作者
论文摘要
差异化(DP)随机凸优化(SCO)在可信赖的机器学习算法设计中无处不在。本文研究了DP-SCO问题,这些问题是从分布中采样并顺序到达的流媒体数据。我们还考虑了连续发布模型,其中与私人信息相关的参数在每个新数据(通常称为在线算法)上进行更新和发布。尽管已经开发了许多算法,以实现不同$ \ ell_p $ norm几何的最佳多余风险,但是现有的算法都无法适应流和持续发布设置。为了应对具有隐私保护的在线凸优化的挑战,我们提出了一种在线弗兰克 - 沃尔夫算法的私人变体,并带有递归梯度,以减少差异,以更新和揭示每个数据上的参数。结合自适应差异隐私分析,我们的在线算法在线性时间内实现了最佳的多余风险,当$ 1 <p \ leq 2 $和最先进的多余风险达到了$ 2 <p \ leq \ leq \ infty $时。我们的算法也可以扩展到$ p = 1 $的情况,以实现几乎与维度无关的多余风险。虽然先前的递归梯度降低结果仅在独立和分布的样本设置中才具有理论保证,但我们在非平稳环境中建立了这样的保证。为了展示我们方法的优点,我们设计了第一个DP算法,用于具有对数遗憾的高维线性土匪。使用多种DP-SCO和DP-Bandit算法的比较实验表现出所提出的算法的功效和实用性。
Differentially private (DP) stochastic convex optimization (SCO) is ubiquitous in trustworthy machine learning algorithm design. This paper studies the DP-SCO problem with streaming data sampled from a distribution and arrives sequentially. We also consider the continual release model where parameters related to private information are updated and released upon each new data, often known as the online algorithms. Despite that numerous algorithms have been developed to achieve the optimal excess risks in different $\ell_p$ norm geometries, yet none of the existing ones can be adapted to the streaming and continual release setting. To address such a challenge as the online convex optimization with privacy protection, we propose a private variant of online Frank-Wolfe algorithm with recursive gradients for variance reduction to update and reveal the parameters upon each data. Combined with the adaptive differential privacy analysis, our online algorithm achieves in linear time the optimal excess risk when $1<p\leq 2$ and the state-of-the-art excess risk meeting the non-private lower ones when $2<p\leq\infty$. Our algorithm can also be extended to the case $p=1$ to achieve nearly dimension-independent excess risk. While previous variance reduction results on recursive gradient have theoretical guarantee only in the independent and identically distributed sample setting, we establish such a guarantee in a non-stationary setting. To demonstrate the virtues of our method, we design the first DP algorithm for high-dimensional generalized linear bandits with logarithmic regret. Comparative experiments with a variety of DP-SCO and DP-Bandit algorithms exhibit the efficacy and utility of the proposed algorithms.