将langevin算法的混合时间分配到其固定分布以进行对数凸线采样

论文标题

将langevin算法的混合时间分配到其固定分布以进行对数凸线采样

Resolving the Mixing Time of the Langevin Algorithm to its Stationary Distribution for Log-Concave Sampling

论文作者

Altschuler, Jason M., Talwar, Kunal

论文摘要

从高维分布中抽样是统计，工程和科学的基本任务。一种规范的方法是Langevin算法，即Markov链，用于离散的Langevin扩散。这是梯度下降的采样类似物。尽管在多个社区进行了数十年的研究，但该算法的紧密混合界限即使在看似简单的对数 - concave分布的设置上也无法解决。本文完全将Langevin算法的混合时间与在这种情况（以及其他）中的固定分布中的混合时间表征。可以将这种混合结果与离散化偏置的任何结合结合在一起，以便从连续的langevin扩散的固定分布中采样。通过这种方式，我们消除了对Langevin算法的混合和偏置的研究。我们的关键见解是将一种从差异隐私文献引入采样文献的技术。该技术称为“隐私放大”，它用作潜在的rényi差异变体，通过最佳传输平滑而在几何上意识到这一点。这提供了最佳混合边界的简短，简单的证明，并具有几个其他吸引人的属性。首先，我们的方法消除了其他抽样分析所需的所有不必要的假设。其次，我们的方法统一了许多设置：如果Langevin算法使用投影，随机的迷你批次梯度或强烈凸电势（从而使我们的混合时间呈指数提高），则它会不变。第三，我们的方法仅通过梯度步骤的合同性来利用凸度 - 让人联想到在梯度下降的教科书证明中如何使用凸度。这样，我们提供了一种新的方法来进一步统一优化和采样算法的分析。

Sampling from a high-dimensional distribution is a fundamental task in statistics, engineering, and the sciences. A canonical approach is the Langevin Algorithm, i.e., the Markov chain for the discretized Langevin Diffusion. This is the sampling analog of Gradient Descent. Despite being studied for several decades in multiple communities, tight mixing bounds for this algorithm remain unresolved even in the seemingly simple setting of log-concave distributions over a bounded domain. This paper completely characterizes the mixing time of the Langevin Algorithm to its stationary distribution in this setting (and others). This mixing result can be combined with any bound on the discretization bias in order to sample from the stationary distribution of the continuous Langevin Diffusion. In this way, we disentangle the study of the mixing and bias of the Langevin Algorithm. Our key insight is to introduce a technique from the differential privacy literature to the sampling literature. This technique, called Privacy Amplification by Iteration, uses as a potential a variant of Rényi divergence that is made geometrically aware via Optimal Transport smoothing. This gives a short, simple proof of optimal mixing bounds and has several additional appealing properties. First, our approach removes all unnecessary assumptions required by other sampling analyses. Second, our approach unifies many settings: it extends unchanged if the Langevin Algorithm uses projections, stochastic mini-batch gradients, or strongly convex potentials (whereby our mixing time improves exponentially). Third, our approach exploits convexity only through the contractivity of a gradient step -- reminiscent of how convexity is used in textbook proofs of Gradient Descent. In this way, we offer a new approach towards further unifying the analyses of optimization and sampling algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题