论文标题
通过高维图形模型选择的最佳路径算法自动变量选择
The Best Path Algorithm automatic variables selection via High Dimensional Graphical Models
论文作者
论文摘要
本文提出了一种在高维图形模型中的自动变量选择过程的新算法。该算法根据相互信息选择了感兴趣的节点的相关变量。文献中的一些贡献研究了在大型数据集中选择适当数量的相关功能时使用相互信息的使用,但是其中大多数都集中在二进制结果上或需要高度的计算工作。这里提出的算法克服了这些缺点,因为它是Chow和Liu算法的扩展。一旦,通过上述算法确定高维图形模型的概率结构,最好的路径步长,包括具有最大解释性/预测能力的变量的最佳途径,可以通过计算确定的熵系数来确定。后者是基于(对称)kullback-leibler差异的概念,事实证明与所涉及变量的相互信息密切相关。与替代现有方法相比,该算法在广泛的现实和公开数据集中的应用强调了其潜力和更大的有效性。
This paper proposes a new algorithm for an automatic variable selection procedure in High Dimensional Graphical Models. The algorithm selects the relevant variables for the node of interest on the basis of mutual information. Several contributions in literature have investigated the use of mutual information in selecting the appropriate number of relevant features in a large data-set, but most of them have focused on binary outcomes or required high computational effort. The algorithm here proposed overcomes these drawbacks as it is an extension of Chow and Liu's algorithm. Once, the probabilistic structure of a High Dimensional Graphical Model is determined via the said algorithm, the best path-step, including variables with the most explanatory/predictive power for a variable of interest, is determined via the computation of the entropy coefficient of determination. The latter, being based on the notion of (symmetric) Kullback-Leibler divergence, turns out to be closely connected to the mutual information of the involved variables. The application of the algorithm to a wide range of real-word and publicly data-sets has highlighted its potential and greater effectiveness compared to alternative extant methods.