论文标题
随机梯度下降在非平滑凸损失上的稳定性
Stability of Stochastic Gradient Descent on Nonsmooth Convex Losses
论文作者
论文摘要
统一的稳定性是算法稳定性的概念,当替换数据集中的单个数据点时,算法中最坏情况下的最坏情况变化。 Hardt等人的有影响力的工作。 (2016年)在足够光滑的凸损耗上,在随机梯度下降(SGD)算法的均匀稳定性上提供了强大的上限。这些结果导致了了解SGD的概括特性以及对平滑损失的差异私有凸优化的几种应用的重要进展。 我们的工作是第一个解决{\ em nonSmooth}凸损失上SGD统一稳定性的工作。具体而言,我们为任意Lipschitz非平滑凸损失的几种形式的SGD和全批GD提供了尖锐的上限和下限。我们的下限表明,在非平滑情况下,(s)GD本质上比在平滑情况下稳定。另一方面,我们的上限表明(s)GD足够稳定,可以在概括误差上得出新的和有用的界限。最值得注意的是,我们在非平滑案例中获得了多通sgd的第一个独立概括界。此外,我们的界限使我们能够得出一种新的算法,用于私人非平滑随机凸优化,并具有最佳的过量人口风险。与非平滑案例Feldman等人的最著名算法相比,我们的算法更简单,更有效。 (2020)。
Uniform stability is a notion of algorithmic stability that bounds the worst case change in the model output by the algorithm when a single data point in the dataset is replaced. An influential work of Hardt et al. (2016) provides strong upper bounds on the uniform stability of the stochastic gradient descent (SGD) algorithm on sufficiently smooth convex losses. These results led to important progress in understanding of the generalization properties of SGD and several applications to differentially private convex optimization for smooth losses. Our work is the first to address uniform stability of SGD on {\em nonsmooth} convex losses. Specifically, we provide sharp upper and lower bounds for several forms of SGD and full-batch GD on arbitrary Lipschitz nonsmooth convex losses. Our lower bounds show that, in the nonsmooth case, (S)GD can be inherently less stable than in the smooth case. On the other hand, our upper bounds show that (S)GD is sufficiently stable for deriving new and useful bounds on generalization error. Most notably, we obtain the first dimension-independent generalization bounds for multi-pass SGD in the nonsmooth case. In addition, our bounds allow us to derive a new algorithm for differentially private nonsmooth stochastic convex optimization with optimal excess population risk. Our algorithm is simpler and more efficient than the best known algorithm for the nonsmooth case Feldman et al. (2020).