随机梯度Langevin动力学的优先采样

论文标题

随机梯度Langevin动力学的优先采样

Preferential Subsampling for Stochastic Gradient Langevin Dynamics

论文作者

Putcha, Srshti, Nemeth, Christopher, Fearnhead, Paul

论文摘要

随机梯度MCMC（SGMCMC）通过使用数据的小，均匀加权的子样本来构建对数 - 寄生虫梯度的无偏估计，提供了传统MCMC的可扩展替代方案。虽然有效地计算，但所得梯度估计器可能表现出很高的差异和影响采样器的性能。传统上，通过构建更好的随机梯度估计器（通常使用控制变体）来解决方差控制问题。我们建议使用离散的，不均匀的概率分布来优先对随机梯度影响更大的子样本数据点。此外，我们提出了一种在算法的每种迭代中自适应调节子样本大小的方法，以便在很难估算梯度的样品空间区域的子样本大小。我们证明，这种方法可以维持相同的准确性，同时大大降低了所使用的平均子样本大小。

Stochastic gradient MCMC (SGMCMC) offers a scalable alternative to traditional MCMC, by constructing an unbiased estimate of the gradient of the log-posterior with a small, uniformly-weighted subsample of the data. While efficient to compute, the resulting gradient estimator may exhibit a high variance and impact sampler performance. The problem of variance control has been traditionally addressed by constructing a better stochastic gradient estimator, often using control variates. We propose to use a discrete, non-uniform probability distribution to preferentially subsample data points that have a greater impact on the stochastic gradient. In addition, we present a method of adaptively adjusting the subsample size at each iteration of the algorithm, so that we increase the subsample size in areas of the sample space where the gradient is harder to estimate. We demonstrate that such an approach can maintain the same level of accuracy while substantially reducing the average subsample size that is used.

下载PDF全文

下载文献需遵守相关版权规定

论文标题