论文标题
分数匹配的统计效率:等级的视图
Statistical Efficiency of Score Matching: The View from Isoperimetry
论文作者
论文摘要
深层生成模型参数为归一化常数(例如基于能量的模型)很难通过最大化数据的可能性来训练,因为可能无法明确或有效地写下其可能性和/或其梯度。得分匹配是一种训练方法,它不适合训练数据的可能性$ \ log p(x)$,而是适合分数功能$ \ nabla_x \ log p(x)$ - 避免了评估分区功能的需求。尽管已知该估计量是一致的,但它的统计效率是否与最大似然相当 - 已知(渐近)最佳。我们在本文中启动了这一询问线,并显示了分数匹配的统计效率与估计的分布的等级属性之间的紧密联系 - 即,庞加莱,log-sobolev和等轴测常数 - 数量 - 数量,这些数量控制Markov流程的混合时间,例如Langevin Dynamics。粗略地,我们表明分数匹配估计器在统计学上与分布具有小等级常数时的最大可能性相当。相反,如果分布具有较大的等值恒定 - 即使对于具有足够统计量的指数级家庭(例如指数型家庭)的简单家族,得分匹配的效率也大大低于最大可能性。我们适当地在有限的样本制度和渐近状态下正式正式化了这些结果。最后,我们在离散设置中确定了一个直接的平行,在该设置中,我们将伪像估计的统计特性与熵和Glauber动力学的近似张力联系起来。
Deep generative models parametrized up to a normalizing constant (e.g. energy-based models) are difficult to train by maximizing the likelihood of the data because the likelihood and/or gradients thereof cannot be explicitly or efficiently written down. Score matching is a training method, whereby instead of fitting the likelihood $\log p(x)$ for the training data, we instead fit the score function $\nabla_x \log p(x)$ -- obviating the need to evaluate the partition function. Though this estimator is known to be consistent, its unclear whether (and when) its statistical efficiency is comparable to that of maximum likelihood -- which is known to be (asymptotically) optimal. We initiate this line of inquiry in this paper, and show a tight connection between statistical efficiency of score matching and the isoperimetric properties of the distribution being estimated -- i.e. the Poincaré, log-Sobolev and isoperimetric constant -- quantities which govern the mixing time of Markov processes like Langevin dynamics. Roughly, we show that the score matching estimator is statistically comparable to the maximum likelihood when the distribution has a small isoperimetric constant. Conversely, if the distribution has a large isoperimetric constant -- even for simple families of distributions like exponential families with rich enough sufficient statistics -- score matching will be substantially less efficient than maximum likelihood. We suitably formalize these results both in the finite sample regime, and in the asymptotic regime. Finally, we identify a direct parallel in the discrete setting, where we connect the statistical properties of pseudolikelihood estimation with approximate tensorization of entropy and the Glauber dynamics.