论文标题

数据科学的光谱方法:统计观点

Spectral Methods for Data Science: A Statistical Perspective

论文作者

Chen, Yuxin, Chi, Yuejie, Fan, Jianqing, Ma, Cong

论文摘要

光谱方法已成为一种简单而令人惊讶的有效方法,用于从大量,嘈杂和不完整的数据中提取信息。简而言之,光谱方法是指基于特征值(分别奇异值)和特征向量(分别奇异向量)的算法的集合,该算法是通过数据构建的一些适当设计的矩阵的算法。在机器学习,数据科学和信号处理中发现了各种各样的应用程序。由于它们的简单性和有效性,光谱方法不仅被用作独立估计器,而且经常被用来初始化其他更复杂的算法以提高性能。 虽然光谱方法的研究可以追溯到经典的基质扰动理论和矩的方法,但过去十年来,借助非Asymymptotic随机矩阵理论,通过统计建模的镜头来实现巨大的理论进步。该专着旨在从现代统计的角度提出对光谱方法的系统,全面但易于访问的介绍,突出了它们在各种大规模应用中的算法含义。特别是,我们的博览会围绕着跨越各种应用的几个中心问题吸引:如何表征光谱方法的样本效率,以达到目标统计准确性的目标水平,以及如何在面对随机噪声,缺失数据和对抗性腐败时评估其稳定性?除了常规$ \ ell_2 $扰动分析外,我们还提供了一个系统的$ \ ell _ {\ infty} $和$ \ ell_ {2,\ infty} $扰动理论,用于特征和奇异的子空间,这仅是由于最近才能提供强大的“剩余”分析框架。

Spectral methods have emerged as a simple yet surprisingly effective approach for extracting information from massive, noisy and incomplete data. In a nutshell, spectral methods refer to a collection of algorithms built upon the eigenvalues (resp. singular values) and eigenvectors (resp. singular vectors) of some properly designed matrices constructed from data. A diverse array of applications have been found in machine learning, data science, and signal processing. Due to their simplicity and effectiveness, spectral methods are not only used as a stand-alone estimator, but also frequently employed to initialize other more sophisticated algorithms to improve performance. While the studies of spectral methods can be traced back to classical matrix perturbation theory and methods of moments, the past decade has witnessed tremendous theoretical advances in demystifying their efficacy through the lens of statistical modeling, with the aid of non-asymptotic random matrix theory. This monograph aims to present a systematic, comprehensive, yet accessible introduction to spectral methods from a modern statistical perspective, highlighting their algorithmic implications in diverse large-scale applications. In particular, our exposition gravitates around several central questions that span various applications: how to characterize the sample efficiency of spectral methods in reaching a target level of statistical accuracy, and how to assess their stability in the face of random noise, missing data, and adversarial corruptions? In addition to conventional $\ell_2$ perturbation analysis, we present a systematic $\ell_{\infty}$ and $\ell_{2,\infty}$ perturbation theory for eigenspace and singular subspaces, which has only recently become available owing to a powerful "leave-one-out" analysis framework.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源