论文标题

高度连接生物学数据的聚类优化方法

Clustering Optimisation Method for Highly Connected Biological Data

论文作者

Tjörnhammar, Richard

论文摘要

当前,生物科学中数据驱动的发现属于在产生数据明智描述的多元数据中找到分割策略。聚类只是几种方法之一,有时由于难以评估合理的截止数量,需要形成的群集数量或方法无法以群集形式保留原始系统的拓扑特性。在这项工作中,我们展示了用于连通性聚类评估的简单指标如何导致对生物学数据的优化分割。 这项工作的新颖性在于创建一种简单的优化方法来聚集拥挤的数据。所得聚类方法仅依赖于从聚类的固有属性得出的指标。新方法有助于优化聚类的知识,这很容易实现。 我们讨论聚类优化策略如何与最终分割产生的可行信息内容相对应。我们进一步详细介绍了在最佳解决方案中的聚类结果如何对应于三个不同数据集的先验知识。

Currently, data-driven discovery in biological sciences resides in finding segmentation strategies in multivariate data that produce sensible descriptions of the data. Clustering is but one of several approaches and sometimes falls short because of difficulties in assessing reasonable cutoffs, the number of clusters that need to be formed or that an approach fails to preserve topological properties of the original system in its clustered form. In this work, we show how a simple metric for connectivity clustering evaluation leads to an optimised segmentation of biological data. The novelty of the work resides in the creation of a simple optimisation method for clustering crowded data. The resulting clustering approach only relies on metrics derived from the inherent properties of the clustering. The new method facilitates knowledge for optimised clustering, which is easy to implement. We discuss how the clustering optimisation strategy corresponds to the viable information content yielded by the final segmentation. We further elaborate on how the clustering results, in the optimal solution, corresponds to prior knowledge of three different data sets.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源