论文标题

与原型的大型树突图的交互式探索

Interactive Exploration of Large Dendrograms with Prototypes

论文作者

Kaplan, Andee, Bien, Jacob

论文摘要

分层聚类是教授的标准方法之一,用于识别和探索数据集中可能存在的基础结构。显示了学生的示例,其中树状图是分层聚类的视觉表示,揭示了一个清晰的聚类结构。但是,在实践中,今天的数据分析师经常遇到数据集,这些数据集的大规模破坏了树状图作为可视化工具的有用性。密集包装的分支掩盖了结构,而重叠的标签是无法阅读的。在本文中,我们提出了一个新的工作流程,用于通过称为Protoshiny的R软件包执行层次聚类,该计划旨在恢复层次聚类,以使其成为有效且多功能的可视化工具的前者。我们的建议利用互动性结合了具有代表性数据点(称为原型)的树状图中内部节点的能力。介绍工作流程后,我们提供了三个案例研究以证明其实用性。

Hierarchical clustering is one of the standard methods taught for identifying and exploring the underlying structures that may be present within a data set. Students are shown examples in which the dendrogram, a visual representation of the hierarchical clustering, reveals a clear clustering structure. However, in practice, data analysts today frequently encounter data sets whose large scale undermines the usefulness of the dendrogram as a visualization tool. Densely packed branches obscure structure, and overlapping labels are impossible to read. In this paper we present a new workflow for performing hierarchical clustering via the R package called protoshiny that aims to restore hierarchical clustering to its former role of being an effective and versatile visualization tool. Our proposal leverages interactivity combined with the ability to label internal nodes in a dendrogram with a representative data point (called a prototype). After presenting the workflow, we provide three case studies to demonstrate its utility.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源