论文标题

连接的dirichlet过程

Conjoined Dirichlet Process

论文作者

Ngo, Michelle N., Pluta, Dustin S., Ngo, Alexander N., Shahbaba, Babak

论文摘要

双簇是一类技术,它们同时将矩阵的行和列簇簇,将异质数据分类为均匀的块。尽管已经提出了许多算法来查找双晶体,但现有方法遭受了双聚簇数量或模型结构上的限制的预先指定。为了解决这些问题,我们开发了一种基于差异过程的新颖的非参数概率双簇方法,以识别在行和列中同时存在强的双群落。所提出的方法利用双dirichlet工艺混合模型来学习行和列簇,其中由数据确定而不是预先指定的产生簇数量。通过对行和列簇之间的相互依赖进行建模,可以识别概率的双群落。我们将我们的方法应用于两个不同的应用程序,即文本挖掘和基因表达分析,并证明我们的方法在与现有方法相比,在许多情况下改善了双片提取。

Biclustering is a class of techniques that simultaneously clusters the rows and columns of a matrix to sort heterogeneous data into homogeneous blocks. Although many algorithms have been proposed to find biclusters, existing methods suffer from the pre-specification of the number of biclusters or place constraints on the model structure. To address these issues, we develop a novel, non-parametric probabilistic biclustering method based on Dirichlet processes to identify biclusters with strong co-occurrence in both rows and columns. The proposed method utilizes dual Dirichlet process mixture models to learn row and column clusters, with the number of resulting clusters determined by the data rather than pre-specified. Probabilistic biclusters are identified by modeling the mutual dependence between the row and column clusters. We apply our method to two different applications, text mining and gene expression analysis, and demonstrate that our method improves bicluster extraction in many settings compared to existing approaches.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源