论文标题
深度学习的概率表示,以改善信息理论解释性
A Probabilistic Representation of Deep Learning for Improving The Information Theoretic Interpretability
论文作者
论文摘要
在本文中,我们提出了多层感知器(MLP)的概率表示,以提高信息理论的解释性。最重要的是,我们证明了激活是I.I.D.对于MLP的所有隐藏层无效,因此基于非参数推理方法的现有共同信息估计器,例如经验分布和内核密度估计值(KDE),无效地测量MLP中的信息流。此外,我们为MLPS介绍了明确的概率解释:(i)为完全连接的层F定义概率空间(Omega_f,t,p_f),并演示激活函数对概率度量P_F的极大效果; (ii)我们证明MLP的整个架构是Gibbs分布P; (iii)后传播旨在优化MLP的所有完全连接层的样本空间omega_f,以学习最佳的Gibbs分布P*,以表达输入和标签之间的统计连接。基于MLP的概率解释,我们在三个方面提高了MLP的信息理论解释性:(i)F的随机变量是离散的,相应的熵是有限的; (ii)如果我们考虑到后传播,信息瓶颈理论将无法正确解释MLP中的信息流; (iii)我们为MLP的概括提出了新的信息理论解释。最后,我们演示了合成数据集和基准数据集中MLP的概率表示和信息理论解释。
In this paper, we propose a probabilistic representation of MultiLayer Perceptrons (MLPs) to improve the information-theoretic interpretability. Above all, we demonstrate that the activations being i.i.d. is not valid for all the hidden layers of MLPs, thus the existing mutual information estimators based on non-parametric inference methods, e.g., empirical distributions and Kernel Density Estimate (KDE), are invalid for measuring the information flow in MLPs. Moreover, we introduce explicit probabilistic explanations for MLPs: (i) we define the probability space (Omega_F, t, P_F) for a fully connected layer f and demonstrate the great effect of an activation function on the probability measure P_F ; (ii) we prove the entire architecture of MLPs as a Gibbs distribution P; and (iii) the back-propagation aims to optimize the sample space Omega_F of all the fully connected layers of MLPs for learning an optimal Gibbs distribution P* to express the statistical connection between the input and the label. Based on the probabilistic explanations for MLPs, we improve the information-theoretic interpretability of MLPs in three aspects: (i) the random variable of f is discrete and the corresponding entropy is finite; (ii) the information bottleneck theory cannot correctly explain the information flow in MLPs if we take into account the back-propagation; and (iii) we propose novel information-theoretic explanations for the generalization of MLPs. Finally, we demonstrate the proposed probabilistic representation and information-theoretic explanations for MLPs in a synthetic dataset and benchmark datasets.