截短的矩阵功率迭代用于可区分的DAG学习

论文标题

截短的矩阵功率迭代用于可区分的DAG学习

Truncated Matrix Power Iteration for Differentiable DAG Learning

论文作者

Zhang, Zhen, Ng, Ignavier, Gong, Dong, Liu, Yuhang, Abbasnejad, Ehsan M, Gong, Mingming, Zhang, Kun, Shi, Javen Qinfeng

论文摘要

从观察数据中恢复潜在的定向无环图（DAG）结构，由于DAG限制的优化问题的组合性质，因此极具挑战性。最近，通过将DAG约束将DAG限制定义为平滑的相等性，通常基于邻接矩阵上的多项式，将DAG学习作为连续优化问题。现有方法将非常小的系数放在高阶多项式术语上以进行稳定，因为它们认为由于数字爆炸而导致高阶项上的大系数有害。相反，我们发现，当邻接矩阵的光谱射线量很小时，高阶术语上的大系数对DAG学习有益，并且高阶项的较大系数可以比小型零件要好得多。基于此，我们提出了一种具有有效截短的矩阵功率迭代的新型DAG学习方法，以基于几何序列的DAG约束。从经验上讲，我们的DAG学习方法在各种环境中的表现都优于先前的最新方法，在结构锤距离上通常以$ 3 $或更多的倍数。

Recovering underlying Directed Acyclic Graph (DAG) structures from observational data is highly challenging due to the combinatorial nature of the DAG-constrained optimization problem. Recently, DAG learning has been cast as a continuous optimization problem by characterizing the DAG constraint as a smooth equality one, generally based on polynomials over adjacency matrices. Existing methods place very small coefficients on high-order polynomial terms for stabilization, since they argue that large coefficients on the higher-order terms are harmful due to numeric exploding. On the contrary, we discover that large coefficients on higher-order terms are beneficial for DAG learning, when the spectral radiuses of the adjacency matrices are small, and that larger coefficients for higher-order terms can approximate the DAG constraints much better than the small counterparts. Based on this, we propose a novel DAG learning method with efficient truncated matrix power iteration to approximate geometric series based DAG constraints. Empirically, our DAG learning method outperforms the previous state-of-the-arts in various settings, often by a factor of $3$ or more in terms of structural Hamming distance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题