论文标题
潜在迪里奇分配中贝叶斯概括误差的确切渐近形式
The Exact Asymptotic Form of Bayesian Generalization Error in Latent Dirichlet Allocation
论文作者
论文摘要
潜在的DIRICHLET分配(LDA)通过使用贝叶斯推断从数据中获取基本信息。它通过在许多领域的降低和聚类中应用于知识发现。但是,由于它是一个从参数到概率分布的一对一映射,因此尚未澄清其概括误差。在本文中,我们通过使用代数几何形状对其学习系数进行理论分析,给出其概括误差和边际可能性的确切渐近形式。理论结果表明,LDA中的贝叶斯概括误差是根据矩阵分解和LDA参数区域的单纯形限制的惩罚表示的。数值实验与理论结果一致。
Latent Dirichlet allocation (LDA) obtains essential information from data by using Bayesian inference. It is applied to knowledge discovery via dimension reducing and clustering in many fields. However, its generalization error had not been yet clarified since it is a singular statistical model where there is no one-to-one mapping from parameters to probability distributions. In this paper, we give the exact asymptotic form of its generalization error and marginal likelihood, by theoretical analysis of its learning coefficient using algebraic geometry. The theoretical result shows that the Bayesian generalization error in LDA is expressed in terms of that in matrix factorization and a penalty from the simplex restriction of LDA's parameter region. A numerical experiment is consistent to the theoretical result.