使用非重叠指数家族的混合物的柔性平均田间变异推理

论文标题

使用非重叠指数家族的混合物的柔性平均田间变异推理

Flexible mean field variational inference using mixtures of non-overlapping exponential families

论文作者

Spence, Jeffrey P.

论文摘要

稀疏模型对于各种领域的许多应用都是可取的，因为它们可以执行自动可变选择，辅助解释性和提供正则化。但是，当将稀疏模型拟合到贝叶斯框架中时，除了最简单的情况外，在分析上获得后验分布在感兴趣的参数上都是棘手的。结果，从业者必须依靠采样算法（例如马尔可夫链蒙特卡洛）或变异方法来获得近似后部。平均字段变异推理是一个特别简单且流行的框架，通常可以分析得出封闭形式的参数更新。当模型中的所有分布都是指数家族的成员，并且在有条件的共轭上时，通常可以手动得出优化方案。但是，我表明，使用标准平均场变异推理可能无法为具有稀疏性诱导先验的模型（例如尖峰和slab）产生明智的结果。幸运的是，当我表明指数式家庭分布与非重叠支持的混合物形成指数家族时，可以对这种病理行为进行修复。特别是，弥漫性指数族的任何混合物和零以零的点质量的模型形成指数族。此外，这些分布的特定选择保持条件结合。我使用两种应用来激发这些结果：一种来自统计遗传学，它与回归系数上的尖峰和slab先验与普遍的最小二乘有连接；和稀疏的概率主成分分析。此处介绍的理论结果广泛适用于这两个示例。

Sparse models are desirable for many applications across diverse domains as they can perform automatic variable selection, aid interpretability, and provide regularization. When fitting sparse models in a Bayesian framework, however, analytically obtaining a posterior distribution over the parameters of interest is intractable for all but the simplest cases. As a result practitioners must rely on either sampling algorithms such as Markov chain Monte Carlo or variational methods to obtain an approximate posterior. Mean field variational inference is a particularly simple and popular framework that is often amenable to analytically deriving closed-form parameter updates. When all distributions in the model are members of exponential families and are conditionally conjugate, optimization schemes can often be derived by hand. Yet, I show that using standard mean field variational inference can fail to produce sensible results for models with sparsity-inducing priors, such as the spike-and-slab. Fortunately, such pathological behavior can be remedied as I show that mixtures of exponential family distributions with non-overlapping support form an exponential family. In particular, any mixture of a diffuse exponential family and a point mass at zero to model sparsity forms an exponential family. Furthermore, specific choices of these distributions maintain conditional conjugacy. I use two applications to motivate these results: one from statistical genetics that has connections to generalized least squares with a spike-and-slab prior on the regression coefficients; and sparse probabilistic principal component analysis. The theoretical results presented here are broadly applicable beyond these two examples.

下载PDF全文

下载文献需遵守相关版权规定

论文标题