论文标题

低级矩阵估计的基本限制,宽高比有分歧

Fundamental Limits of Low-Rank Matrix Estimation with Diverging Aspect Ratios

论文作者

Montanari, Andrea, Wu, Yuchen

论文摘要

我们考虑了估计低级别$ n \ times d $矩阵的因素的问题,当它被加性高斯噪声损坏时。我们设置的一个特殊例子对应于具有相等(已知)协方差的高斯人的聚类混合物。简单的光谱方法没有考虑到这些因素的条目的分布,因此通常是次优的。在这里,我们表征了最小估计误差的渐近学,这是统计学家已知条目的分布的假设。 我们的结果适用于高维度$ n,d \ to \ infty $和$ d / n \ to \ infty $(或$ d / n \至0 $),并概括了早期的工作,这些工作集中在比例渐近级,d \ to \ infty $ n,d / n \ d / n \ d / n \ teu($ d / n \ te)中。我们概述了一个有趣的信号强度制度,其中$ d / n \至\ infty $,左单数向量可以进行部分恢复,而右单数向量则不可能。 我们通过得出高斯混合物聚类的后果并对基因组学数据进行数值研究来说明一般理论。

We consider the problem of estimating the factors of a low-rank $n \times d$ matrix, when this is corrupted by additive Gaussian noise. A special example of our setting corresponds to clustering mixtures of Gaussians with equal (known) covariances. Simple spectral methods do not take into account the distribution of the entries of these factors and are therefore often suboptimal. Here, we characterize the asymptotics of the minimum estimation error under the assumption that the distribution of the entries is known to the statistician. Our results apply to the high-dimensional regime $n, d \to \infty$ and $d / n \to \infty$ (or $d / n \to 0$) and generalize earlier work that focused on the proportional asymptotics $n, d \to \infty$, $d / n \to δ\in (0, \infty)$. We outline an interesting signal strength regime in which $d / n \to \infty$ and partial recovery is possible for the left singular vectors while impossible for the right singular vectors. We illustrate the general theory by deriving consequences for Gaussian mixture clustering and carrying out a numerical study on genomics data.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源