论文标题
基于比较的近比较聚类
Near-Optimal Comparison Based Clustering
论文作者
论文摘要
聚类的目的是将类似对象分组为有意义的分区。当给出对象之间的明确相似性度量时,就可以很好地理解此过程。但是,当此信息不容易获得时,众所周知,相反,只观察到诸如“对象I更像j相比j与k”更相似的顺序比较。在本文中,我们使用两步过程来解决此问题:在使用基于半明确编程(SDP)的聚类方法之前,我们估计比较中的成对相似性矩阵。从理论上讲,我们表明我们的方法可以使用几乎最佳数量的被动比较来精确地恢复种植的聚类。我们从经验上验证了我们的理论发现,并证明了我们方法在实际数据上的良好行为。
The goal of clustering is to group similar objects into meaningful partitions. This process is well understood when an explicit similarity measure between the objects is given. However, far less is known when this information is not readily available and, instead, one only observes ordinal comparisons such as "object i is more similar to j than to k." In this paper, we tackle this problem using a two-step procedure: we estimate a pairwise similarity matrix from the comparisons before using a clustering method based on semi-definite programming (SDP). We theoretically show that our approach can exactly recover a planted clustering using a near-optimal number of passive comparisons. We empirically validate our theoretical findings and demonstrate the good behaviour of our method on real data.