论文标题
非排量,重叠的共聚类:扩展分析
Non-Exhaustive, Overlapping Co-Clustering: An Extended Analysis
论文作者
论文摘要
共聚类的目的是同时确定行的聚类以及二维数据矩阵的列。已经提出了许多共聚类技术,包括信息理论共群集和最小总和积分残基共聚类方法。但是,大多数现有的共聚类算法旨在找到成对的脱节和详尽的共簇,而许多现实世界中的数据集不仅包含共截断者之间的大重叠,而且还包含不应属于任何共群集的离群值。在本文中,我们制定了非排量,重叠的共聚类的问题,其中允许行和列簇彼此重叠,而对于数据矩阵的每个维度,则不分配给任何群集。为了解决这个问题,我们提出了直观的目标函数,并开发了一种有效的迭代算法,我们称之为NEO-CC算法。从理论上讲,我们表明,NEO-CC算法单调降低了所提出的目标函数。实验结果表明,NEO-CC算法能够有效捕获现实世界数据的基础共聚类结构,因此优于最先进的聚类和共聚类方法。该手稿包括[21]的扩展分析。
The goal of co-clustering is to simultaneously identify a clustering of rows as well as columns of a two dimensional data matrix. A number of co-clustering techniques have been proposed including information-theoretic co-clustering and the minimum sum-squared residue co-clustering method. However, most existing co-clustering algorithms are designed to find pairwise disjoint and exhaustive co-clusters while many real-world datasets contain not only a large overlap between co-clusters but also outliers which should not belong to any co-cluster. In this paper, we formulate the problem of Non-Exhaustive, Overlapping Co-Clustering where both of the row and column clusters are allowed to overlap with each other and outliers for each dimension of the data matrix are not assigned to any cluster. To solve this problem, we propose intuitive objective functions, and develop an an efficient iterative algorithm which we call the NEO-CC algorithm. We theoretically show that the NEO-CC algorithm monotonically decreases the proposed objective functions. Experimental results show that the NEO-CC algorithm is able to effectively capture the underlying co-clustering structure of real-world data, and thus outperforms state-of-the-art clustering and co-clustering methods. This manuscript includes an extended analysis of [21].