论文标题

稀疏PCA:几何方法

Sparse PCA: a Geometric Approach

论文作者

Bertsimas, Dimitris, Kitane, Driss Lahlou

论文摘要

我们考虑使用具有固定基数支持的正交稀疏主组件从数据矩阵中解释的方差的问题。尽管大多数现有的方法都集中在通过放气构建主要成分(PC),但我们建议Geospca(一种新颖的算法)一次构建所有PC,同时满足正交性约束,从而带来了对屈光度的实质性好处。这种新颖的方法基于协方差矩阵的左侧特征值,该矩阵通过使用二进制线性优化问题近似最佳解决方案来帮助避免问题的非转换性,从而找到最佳解决方案。所得近似可用于解决稀疏PCA问题的不同版本,包括主组件共享相同的支持或具有不相交的支持以及结构化的稀疏PCA问题的情况。我们还提出了最佳范围,并说明了Geospca在选定的现实世界中的好处,无论是在解释的差异,稀疏性和障碍方面。改进与贪婪算法通常与最先进的技术相提并论,在差异方面达到了24%,同时解决了现实世界中的问题,并在几分钟内使用10,000次变量和支持100s的基数支持。我们还将GEOSPCA应用于面部识别问题,与其他基于PCA的技术(例如结构化稀疏PCA)相比,提高了10%以上。

We consider the problem of maximizing the variance explained from a data matrix using orthogonal sparse principal components that have a support of fixed cardinality. While most existing methods focus on building principal components (PCs) iteratively through deflation, we propose GeoSPCA, a novel algorithm to build all PCs at once while satisfying the orthogonality constraints which brings substantial benefits over deflation. This novel approach is based on the left eigenvalues of the covariance matrix which helps circumvent the non-convexity of the problem by approximating the optimal solution using a binary linear optimization problem that can find the optimal solution. The resulting approximation can be used to tackle different versions of the sparse PCA problem including the case in which the principal components share the same support or have disjoint supports and the Structured Sparse PCA problem. We also propose optimality bounds and illustrate the benefits of GeoSPCA in selected real world problems both in terms of explained variance, sparsity and tractability. Improvements vs. the greedy algorithm, which is often at par with state-of-the-art techniques, reaches up to 24% in terms of variance while solving real world problems with 10,000s of variables and support cardinality of 100s in minutes. We also apply GeoSPCA in a face recognition problem yielding more than 10% improvement vs. other PCA based technique such as structured sparse PCA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源