论文标题
弹性耦合共聚类单细胞基因组数据
Elastic Coupled Co-clustering for Single-Cell Genomic Data
论文作者
论文摘要
单细胞技术的最新进展使我们能够以前所未有的分辨率以及可提供来自多个域的数据集概述基因组特征,包括介绍不同类型的基因组特征和数据集的数据集,这些数据集介绍了相同类型的基因组特征在不同物种上。这些数据集通常在通过聚类识别未知单元格类型方面具有不同的功能,并且数据集成可以潜在地导致聚类算法的更好性能。在这项工作中,我们在无监督的转移学习框架中提出了问题,该框架利用从辅助数据集中学到的知识来改善目标数据集的聚类性能。目标和辅助数据集之间的共享信息程度可能会有所不同,它们的分布也可能不同。为了应对这些挑战,我们提出了一种基于弹性的耦合共聚类的转移学习算法,通过弹性地传播从辅助数据集获得的聚类知识到目标数据集。单细胞基因组数据集的实现表明,我们的算法大大提高了传统学习算法的聚类性能。源代码和数据集可在https://github.com/cuhklinlab/elasticc3上找到。
The recent advances in single-cell technologies have enabled us to profile genomic features at unprecedented resolution and datasets from multiple domains are available, including datasets that profile different types of genomic features and datasets that profile the same type of genomic features across different species. These datasets typically have different powers in identifying the unknown cell types through clustering, and data integration can potentially lead to a better performance of clustering algorithms. In this work, we formulate the problem in an unsupervised transfer learning framework, which utilizes knowledge learned from auxiliary dataset to improve the clustering performance of target dataset. The degree of shared information among the target and auxiliary datasets can vary, and their distributions can also be different. To address these challenges, we propose an elastic coupled co-clustering based transfer learning algorithm, by elastically propagating clustering knowledge obtained from the auxiliary dataset to the target dataset. Implementation on single-cell genomic datasets shows that our algorithm greatly improves clustering performance over the traditional learning algorithms. The source code and data sets are available at https://github.com/cuhklinlab/elasticC3.