有限混合模型不能可靠地了解组件的数量

论文标题

有限混合模型不能可靠地了解组件的数量

Finite mixture models do not reliably learn the number of components

论文作者

Cai, Diana, Campbell, Trevor, Broderick, Tamara

论文摘要

科学家和工程师通常有兴趣学习数据集中存在的亚种群（或组件）的数量。一个常见的建议是使用有限的混合模型（FMM）与组件数量的先验。过去的工作表明，由此产生的FMM分量计数后部是一致的。也就是说，后验集中于真实的成分数量。但是一致性需要这样的假设，即组成可能性是完美指定的，这在实践中是不现实的。在本文中，我们通过证明，即使在丝毫模型错误指定下，FMM组件计数后验差异：任何特定有限数量的组件的后验概率在无限数据的限制中收敛到0。与直觉相反，后部密度一致性不足以确定此结果。我们开发了与渐近文献中常见的新型充分条件，更现实，更容易检查。我们说明了我们的理论对模拟和真实数据的实际后果。

Scientists and engineers are often interested in learning the number of subpopulations (or components) present in a data set. A common suggestion is to use a finite mixture model (FMM) with a prior on the number of components. Past work has shown the resulting FMM component-count posterior is consistent; that is, the posterior concentrates on the true, generating number of components. But consistency requires the assumption that the component likelihoods are perfectly specified, which is unrealistic in practice. In this paper, we add rigor to data-analysis folk wisdom by proving that under even the slightest model misspecification, the FMM component-count posterior diverges: the posterior probability of any particular finite number of components converges to 0 in the limit of infinite data. Contrary to intuition, posterior-density consistency is not sufficient to establish this result. We develop novel sufficient conditions that are more realistic and easily checkable than those common in the asymptotics literature. We illustrate practical consequences of our theory on simulated and real data.

下载PDF全文

下载文献需遵守相关版权规定

论文标题