稀疏PCA的精确和近似算法

论文标题

稀疏PCA的精确和近似算法

Exact and Approximation Algorithms for Sparse PCA

论文作者

Li, Yongchun, Xie, Weijun

论文摘要

稀疏PCA（SPCA）是机器学习和数据分析中的基本模型，该模型目睹了各种应用领域，例如金融，制造，生物学，医疗保健。为了从协方差矩阵中选择一个预先指定的主体子元素，以最大程度地提高其最大的特征值，以提高可解释性目的，SPCA可以通过功能选择和降低维度降低来推进常规的PCA。本文通过利用协方差矩阵的频谱分解和最大特征值的特性来提出两个精确的混合企业SDP（MISDP）。然后，我们分析其连续放松值的理论最优差距，并证明它们比最先进的理论最强。我们进一步表明，两个错误的连续放松可以作为鞍点问题重铸，而无需涉及半明确锥，因此可以通过一阶方法（例如亚甘麦方法）有效地解决。由于总体而言，由于现成的求解器很难解决错误，因此我们通过与Misdps相似的混合构成线性程序（MILP）任意准确地近似SPCA。为了更可扩展，我们还分析了贪婪和局部搜索算法，证明了它们的第一个已知近似值，并表明近似值比很紧。我们的数值研究表明，所提出的小型DP的连续松弛值非常接近最佳性，所提出的MILP模型可以将中小型实例求解到最优性，并且近似算法对所有实例都非常有效。最后，当存在多个协方差矩阵时，我们将分析扩展到具有非对称矩阵和稀疏的fair PCA（SFPCA）的稀疏SVD（R1-SSVD），每个矩阵与受保护组相对应。

Sparse PCA (SPCA) is a fundamental model in machine learning and data analytics, which has witnessed a variety of application areas such as finance, manufacturing, biology, healthcare. To select a prespecified-size principal submatrix from a covariance matrix to maximize its largest eigenvalue for the better interpretability purpose, SPCA advances the conventional PCA with both feature selection and dimensionality reduction. This paper proposes two exact mixed-integer SDPs (MISDPs) by exploiting the spectral decomposition of the covariance matrix and the properties of the largest eigenvalues. We then analyze the theoretical optimality gaps of their continuous relaxation values and prove that they are stronger than that of the state-of-art one. We further show that the continuous relaxations of two MISDPs can be recast as saddle point problems without involving semi-definite cones, and thus can be effectively solved by first-order methods such as the subgradient method. Since off-the-shelf solvers, in general, have difficulty in solving MISDPs, we approximate SPCA with arbitrary accuracy by a mixed-integer linear program (MILP) of a similar size as MISDPs. To be more scalable, we also analyze greedy and local search algorithms, prove their first-known approximation ratios, and show that the approximation ratios are tight. Our numerical study demonstrates that the continuous relaxation values of the proposed MISDPs are quite close to optimality, the proposed MILP model can solve small and medium-size instances to optimality, and the approximation algorithms work very well for all the instances. Finally, we extend the analyses to Rank-one Sparse SVD (R1-SSVD) with non-symmetric matrices and Sparse Fair PCA (SFPCA) when there are multiple covariance matrices, each corresponding to a protected group.

下载PDF全文

下载文献需遵守相关版权规定

论文标题