论文标题

使用左克矩阵来群集高维数据

Using the left Gram matrix to cluster high dimensional data

论文作者

Rahman, Shahina, Johnson, Valen E., Rao, Suhasini Subba

论文摘要

对于高维数据,其中n个对象(p >> n)的p特征在NXP矩阵X中表示,我们描述了一种基于归一化左克矩阵的聚类算法,g = xx'/p。在某些规律性条件下,G中的行与同一群集中对象相对应的行会收敛到相同的平均向量。通过在行均值上进行聚类,该算法不需要通过降低或特征选择技术进行预处理,也不需要调整或参数值的规格。因为它基于NXN矩阵G,所以它的计算成本低于基于群集矩阵X的许多方法。与其他14种其他群集算法相比,该算法适用于32个基准标记的微阵列数据集,拟议算法提供了比两分之两的竞争量最准确的估计,并且是最接近的群集配置的估计。

For high dimensional data, where P features for N objects (P >> N) are represented in an NxP matrix X, we describe a clustering algorithm based on the normalized left Gram matrix, G = XX'/P. Under certain regularity conditions, the rows in G that correspond to objects in the same cluster converge to the same mean vector. By clustering on the row means, the algorithm does not require preprocessing by dimension reduction or feature selection techniques and does not require specification of tuning or hyperparameter values. Because it is based on the NxN matrix G, it has a lower computational cost than many methods based on clustering the feature matrix X. When compared to 14 other clustering algorithms applied to 32 benchmarked microarray datasets, the proposed algorithm provided the most accurate estimate of the underlying cluster configuration more than twice as often as its closest competitors.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源