切片采样，以完全随机测量

论文标题

切片采样，以完全随机测量

Slice Sampling for General Completely Random Measures

论文作者

Zhu, Peiyuan, Bouchard-Côté, Alexandre, Campbell, Trevor

论文摘要

完全随机的度量为创建灵活的无监督模型提供了一种原则性的方法，其中潜在特征的数量是无限的，并且随着数据集的大小而影响数据的功能数量。由于无穷大特征，后推理需要边缘化 - - 导致依赖性结构，以防止通过并行化和偶联性 - 或有限的截断，或者任意限制模型的灵活性。在本文中，我们提出了一种新型的马尔可夫链蒙特卡洛算法，用于后推断，该算法使用辅助切片变量适应截断水平，从而在不牺牲灵活性的情况下实现了有效的，平行的计算。与以前的工作以逐型模型为基础实现这一目标的工作相反，我们提供了一种适用于一类完全随机的基于措施的先验的一般配方。在几种流行的非参数模型上评估了所提出的算法的功效，与使用固定截断的模型相比，与使用边缘化的算法相比，与使用边缘化的算法相比，每秒的有效样本量更高，并且具有更高的预测性能。

Completely random measures provide a principled approach to creating flexible unsupervised models, where the number of latent features is infinite and the number of features that influence the data grows with the size of the data set. Due to the infinity the latent features, posterior inference requires either marginalization---resulting in dependence structures that prevent efficient computation via parallelization and conjugacy---or finite truncation, which arbitrarily limits the flexibility of the model. In this paper we present a novel Markov chain Monte Carlo algorithm for posterior inference that adaptively sets the truncation level using auxiliary slice variables, enabling efficient, parallelized computation without sacrificing flexibility. In contrast to past work that achieved this on a model-by-model basis, we provide a general recipe that is applicable to the broad class of completely random measure-based priors. The efficacy of the proposed algorithm is evaluated on several popular nonparametric models, demonstrating a higher effective sample size per second compared to algorithms using marginalization as well as a higher predictive performance compared to models employing fixed truncations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题