论文标题

部分可观测时空混沌系统的无模型预测

Robust PCA for High Dimensional Data based on Characteristic Transformation

论文作者

He, Lingyu, Yang, Yanrong, Zhang, Bo

论文摘要

在本文中,我们在存在各种异质性的情况下,尤其是重尾和异常值的情况下,提出了一种新颖的鲁棒主成分分析(PCA),以用于高维数据。构建了由特征函数促进的转换,以改善经典PCA的鲁棒性。除典型的离群值外,所提出的方法还具有处理重尾分配的数据的独特优势,其协方差可能不存在(例如,无限的无限)。所提出的方法也是内核主成分分析(KPCA)方法的情况,并通过有界和非线性核函数采用了鲁棒和非线性特性。新方法的优点由一些统计特性说明,包括多余误差的上限以及在加标协方差模型下大型特征值的行为。此外,我们通过各种模拟显示了方法比经典PCA的优势。最后,我们根据其蛋白质表达数据在生物学研究中使用新的鲁棒PCA将小鼠与不同的基因型分类,并发现我们的方法更准确地鉴定出与经典PCA相比的异常小鼠。

In this paper, we propose a novel robust Principal Component Analysis (PCA) for high-dimensional data in the presence of various heterogeneities, especially the heavy-tailedness and outliers. A transformation motivated by the characteristic function is constructed to improve the robustness of the classical PCA. Besides the typical outliers, the proposed method has the unique advantage of dealing with heavy-tail-distributed data, whose covariances could be nonexistent (positively infinite, for instance). The proposed approach is also a case of kernel principal component analysis (KPCA) method and adopts the robust and non-linear properties via a bounded and non-linear kernel function. The merits of the new method are illustrated by some statistical properties including the upper bound of the excess error and the behaviors of the large eigenvalues under a spiked covariance model. In addition, we show the advantages of our method over the classical PCA by a variety of simulations. At last, we apply the new robust PCA to classify mice with different genotypes in a biological study based on their protein expression data and find that our method is more accurately on identifying abnormal mice comparing to the classical PCA.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源