论文标题
瑞士人:可伸缩的马尔可夫链蒙特卡洛分裂策略
SwISS: A Scalable Markov chain Monte Carlo Divide-and-Conquer Strategy
论文作者
论文摘要
蒙特卡洛算法的分裂策略是使贝叶斯推断可扩展到大型数据集的一种越来越流行的方法。以最简单的形式,数据分配在多个计算核心上,每个核心上的单独的马尔可夫链蒙特卡洛算法靶向相关的部分后验分布,我们称为子形成型,这是后部的,即仅给定与该核心分区段的数据。划分和诱导技术减少了计算,记忆和磁盘瓶颈,但很难重组后方样品。我们提出了瑞士人:具有通货膨胀,缩放和转移的子派员;一种新的方法,用于重新组合易于应用的次级样品,缩放到高维参数空间,并通过副本样品的仿射变换准确地近似原始的后验分布。我们证明,在一系列自然的仿射转换集中,我们的转换在渐近中是最佳的,并说明了瑞士对合成和现实世界数据集的竞争算法的功效。
Divide-and-conquer strategies for Monte Carlo algorithms are an increasingly popular approach to making Bayesian inference scalable to large data sets. In its simplest form, the data are partitioned across multiple computing cores and a separate Markov chain Monte Carlo algorithm on each core targets the associated partial posterior distribution, which we refer to as a sub-posterior, that is the posterior given only the data from the segment of the partition associated with that core. Divide-and-conquer techniques reduce computational, memory and disk bottle-necks, but make it difficult to recombine the sub-posterior samples. We propose SwISS: Sub-posteriors with Inflation, Scaling and Shifting; a new approach for recombining the sub-posterior samples which is simple to apply, scales to high-dimensional parameter spaces and accurately approximates the original posterior distribution through affine transformations of the sub-posterior samples. We prove that our transformation is asymptotically optimal across a natural set of affine transformations and illustrate the efficacy of SwISS against competing algorithms on synthetic and real-world data sets.