论文标题
通过T-SNE,DBSCAN和随机森林进行数据分割
Data Segmentation via t-SNE, DBSCAN, and Random Forest
论文作者
论文摘要
这项研究提出了一种数据分割算法,该算法结合了T-SNE,DBSCAN和随机森林分类器,以形成端到端管道,该管道将数据分离为自然簇,并根据最重要的特征产生每个群集的特征。可以推断出样本外群集标签,并且该技术在真实数据集上很好地推广。我们描述算法并使用Iris和MNIST数据集以及Instagram的真实社交媒体网站数据提供案例研究。这是概念证明,为进一步深入的理论分析设定了舞台。
This research proposes a data segmentation algorithm which combines t-SNE, DBSCAN, and Random Forest classifier to form an end-to-end pipeline that separates data into natural clusters and produces a characteristic profile of each cluster based on the most important features. Out-of-sample cluster labels can be inferred, and the technique generalizes well on real data sets. We describe the algorithm and provide case studies using the Iris and MNIST data sets, as well as real social media site data from Instagram. This is a proof of concept and sets the stage for further in-depth theoretical analysis.