论文标题
从相关数据集中学习的结构学习的贝叶斯分层分数
A Bayesian Hierarchical Score for Structure Learning from Related Data Sets
论文作者
论文摘要
在文献中学习贝叶斯网络的结构的得分功能假设数据是一组均匀的观察值。尽管它们通常构成不同的相关但不是同质的情况,但以不同的方式收集的数据集。在本文中,我们提出了一个新的Bayesian Dirichlet得分,我们称之为贝叶斯等级迪里奇(BHD)。拟议的分数基于一个分层模型,该模型将信息跨越数据集汇总信息,以学习一个包含网络结构,同时考虑到其概率结构的差异。我们使用边际可能性的变异近似来得出BHD的封闭式表达,我们研究了相关的计算成本,并使用模拟数据评估了其性能。我们发现,当数据包含多个相关数据集时,BHD的表现优于贝叶斯迪里奇等效统一(BDEU)得分,其重建精度(通过结构性锤距离衡量的重建精度),并且在数据均一时它与BDEU一样准确。当网络中的变量数量或观察次数较大时,这种改进尤其清楚。此外,估计的网络比使用BDEU获得的网络更稀疏,因此更容易解释,这要归功于较低的假阳性弧。
Score functions for learning the structure of Bayesian networks in the literature assume that data are a homogeneous set of observations; whereas it is often the case that they comprise different related, but not homogeneous, data sets collected in different ways. In this paper we propose a new Bayesian Dirichlet score, which we call Bayesian Hierarchical Dirichlet (BHD). The proposed score is based on a hierarchical model that pools information across data sets to learn a single encompassing network structure, while taking into account the differences in their probabilistic structures. We derive a closed-form expression for BHD using a variational approximation of the marginal likelihood, we study the associated computational cost and we evaluate its performance using simulated data. We find that, when data comprise multiple related data sets, BHD outperforms the Bayesian Dirichlet equivalent uniform (BDeu) score in terms of reconstruction accuracy as measured by the Structural Hamming distance, and that it is as accurate as BDeu when data are homogeneous. This improvement is particularly clear when either the number of variables in the network or the number of observations is large. Moreover, the estimated networks are sparser and therefore more interpretable than those obtained with BDeu thanks to a lower number of false positive arcs.