论文标题
块模型指导了无监督的特征选择
Block Model Guided Unsupervised Feature Selection
论文作者
论文摘要
特征选择是数据挖掘的核心区域,最新的图形驱动式数据的特征选择是针对链接数据的。在这种情况下,我们有一个数据集$ \ Mathbf {y} $由$ n $实例组成,每个$ n $ instances具有$ m $功能和一个相应的$ n $ node图(其邻接矩阵为$ \ mathbf {a} $),表明这两个实例相似。在属性网络上进行无监督功能选择的现有努力已通过求解$ f $来直接重新再生链接,从而使$ f(\ mathbf {y} _i,\ mathbf {y} _j _j)\近似\ Mathbf {y Mathbf {a} _} $ \ mathbf {y} $预测这些社区。但是,对于探索更复杂的指导,图形驱动的无监督特征选择仍然是一个研究的领域。在这里,我们采用了首先在图表上构建块模型的新方法,然后使用块模型进行特征选择。也就是说,我们发现$ \ mathbf {f} \ mathbf {m} \ mathbf {f}^t \ of cout \ mathbf {a} $,然后找到$ \ mathcal {s} $的子集,该子集诱导了另一个图形,以保存另一个$ \ \ m m mathbf {f} $和$ \ \ \ \ \ \ \ \ = Mathbf {m我们称我们的接近块模型指导了无监督的特征选择(BMGUF)。实验结果表明,我们的方法在寻找用于聚类的高质量特征方面优于几个现实世界公共数据集上的艺术状态。
Feature selection is a core area of data mining with a recent innovation of graph-driven unsupervised feature selection for linked data. In this setting we have a dataset $\mathbf{Y}$ consisting of $n$ instances each with $m$ features and a corresponding $n$ node graph (whose adjacency matrix is $\mathbf{A}$) with an edge indicating that the two instances are similar. Existing efforts for unsupervised feature selection on attributed networks have explored either directly regenerating the links by solving for $f$ such that $f(\mathbf{y}_i,\mathbf{y}_j) \approx \mathbf{A}_{i,j}$ or finding community structure in $\mathbf{A}$ and using the features in $\mathbf{Y}$ to predict these communities. However, graph-driven unsupervised feature selection remains an understudied area with respect to exploring more complex guidance. Here we take the novel approach of first building a block model on the graph and then using the block model for feature selection. That is, we discover $\mathbf{F}\mathbf{M}\mathbf{F}^T \approx \mathbf{A}$ and then find a subset of features $\mathcal{S}$ that induces another graph to preserve both $\mathbf{F}$ and $\mathbf{M}$. We call our approach Block Model Guided Unsupervised Feature Selection (BMGUFS). Experimental results show that our method outperforms the state of the art on several real-world public datasets in finding high-quality features for clustering.