论文标题
超越EM算法过度指定的两个组成位置尺度高斯混合物
Beyond EM Algorithm on Over-specified Two-Component Location-Scale Gaussian Mixtures
论文作者
论文摘要
预期最大化(EM)算法主要用于近似位置尺度高斯混合物的最大似然估计。但是,当模型被过度指定时,即拟合数据的所选组件数量大于未知数的组件数量,EM需要按样本量的多项式迭代数量,以达到最终的统计半径;在实践中,这在计算上很昂贵。 EM的缓慢收敛是由于局部强大的凸度缺失相对于负群体对数可能性函数的位置参数,即当样本大小输入无限时,负样品对数可能性函数的极限。为了有效地探索负模样函数的曲率,通过专门考虑两个组件位置尺度的高斯混合物,我们开发了指数位置更新(ELU)算法。 ELU算法的想法是,我们首先获得了比例参数的确切最佳解决方案,然后对位置参数执行指数级别的梯度下降。我们从理论和经验上证明了Elu迭代在对数迭代数量后收敛到模型的最终统计半径。据我们所知,它在文献中解决了关于开发优化算法的长期开放问题,该算法即使在过度指定的高斯混合模型的某些特定设置下,该算法也具有最佳的统计和计算复杂性来解决参数估计。
The Expectation-Maximization (EM) algorithm has been predominantly used to approximate the maximum likelihood estimation of the location-scale Gaussian mixtures. However, when the models are over-specified, namely, the chosen number of components to fit the data is larger than the unknown true number of components, EM needs a polynomial number of iterations in terms of the sample size to reach the final statistical radius; this is computationally expensive in practice. The slow convergence of EM is due to the missing of the locally strong convexity with respect to the location parameter on the negative population log-likelihood function, i.e., the limit of the negative sample log-likelihood function when the sample size goes to infinity. To efficiently explore the curvature of the negative log-likelihood functions, by specifically considering two-component location-scale Gaussian mixtures, we develop the Exponential Location Update (ELU) algorithm. The idea of the ELU algorithm is that we first obtain the exact optimal solution for the scale parameter and then perform an exponential step-size gradient descent for the location parameter. We demonstrate theoretically and empirically that the ELU iterates converge to the final statistical radius of the models after a logarithmic number of iterations. To the best of our knowledge, it resolves the long-standing open question in the literature about developing an optimization algorithm that has optimal statistical and computational complexities for solving parameter estimation even under some specific settings of the over-specified Gaussian mixture models.