论文标题
使用种植结构的张量聚类:统计最佳和计算限制
Tensor Clustering with Planted Structures: Statistical Optimality and Computational Limits
论文作者
论文摘要
本文研究了使用种植结构的高阶聚类的统计和计算限制。我们专注于两个聚类模型,即恒定的高阶聚类(CHC)和排名一号高阶聚类(ROHC),并研究了测试簇是否存在(检测)并识别群集的支持(恢复)的方法和理论。 具体而言,我们确定了CHC和ROHC检测/恢复在统计学上是可能的信噪比的急剧边界。我们还发展了紧密的计算阈值:当信噪比低于这些阈值时,我们证明多项式时间算法无法在超差异植物群(HPC)检测的计算硬度猜想下解决这些问题,而过度分别植物植物的浓度量(HPDS)恢复。我们还提出了当信噪比高于这些阈值时,可以实现可靠的检测和恢复的多项式时间张量算法。稀疏性和张量结构都在高阶张量聚类中产生计算屏障。它们之间的相互作用导致在统计和计算相过渡图,算法方法,硬度猜想和证明技术方面的文献中,高阶张量聚类和基质聚类之间的显着差异。据我们所知,我们是第一个为这种双重计算级级问题的统计和计算权衡做出彻底表征的人。最后,我们为HPC检测的计算硬度猜想(通过低级多项式和大都市方法)和HPDS恢复(通过低级多项式方法)提供了证据。
This paper studies the statistical and computational limits of high-order clustering with planted structures. We focus on two clustering models, constant high-order clustering (CHC) and rank-one higher-order clustering (ROHC), and study the methods and theory for testing whether a cluster exists (detection) and identifying the support of cluster (recovery). Specifically, we identify the sharp boundaries of signal-to-noise ratio for which CHC and ROHC detection/recovery are statistically possible. We also develop the tight computational thresholds: when the signal-to-noise ratio is below these thresholds, we prove that polynomial-time algorithms cannot solve these problems under the computational hardness conjectures of hypergraphic planted clique (HPC) detection and hypergraphic planted dense subgraph (HPDS) recovery. We also propose polynomial-time tensor algorithms that achieve reliable detection and recovery when the signal-to-noise ratio is above these thresholds. Both sparsity and tensor structures yield the computational barriers in high-order tensor clustering. The interplay between them results in significant differences between high-order tensor clustering and matrix clustering in literature in aspects of statistical and computational phase transition diagrams, algorithmic approaches, hardness conjecture, and proof techniques. To our best knowledge, we are the first to give a thorough characterization of the statistical and computational trade-off for such a double computational-barrier problem. Finally, we provide evidence for the computational hardness conjectures of HPC detection (via low-degree polynomial and Metropolis methods) and HPDS recovery (via low-degree polynomial method).