深度学习的概率表示，以改善信息理论解释性

论文标题

深度学习的概率表示，以改善信息理论解释性

A Probabilistic Representation of Deep Learning for Improving The Information Theoretic Interpretability

论文作者

Lan, Xinjie, Barner, Kenneth E.

论文摘要

在本文中，我们提出了多层感知器（MLP）的概率表示，以提高信息理论的解释性。最重要的是，我们证明了激活是I.I.D.对于MLP的所有隐藏层无效，因此基于非参数推理方法的现有共同信息估计器，例如经验分布和内核密度估计值（KDE），无效地测量MLP中的信息流。此外，我们为MLPS介绍了明确的概率解释：（i）为完全连接的层F定义概率空间（Omega_f，t，p_f），并演示激活函数对概率度量P_F的极大效果；（ii）我们证明MLP的整个架构是Gibbs分布P；（iii）后传播旨在优化MLP的所有完全连接层的样本空间omega_f，以学习最佳的Gibbs分布P*，以表达输入和标签之间的统计连接。基于MLP的概率解释，我们在三个方面提高了MLP的信息理论解释性：（i）F的随机变量是离散的，相应的熵是有限的；（ii）如果我们考虑到后传播，信息瓶颈理论将无法正确解释MLP中的信息流；（iii）我们为MLP的概括提出了新的信息理论解释。最后，我们演示了合成数据集和基准数据集中MLP的概率表示和信息理论解释。

In this paper, we propose a probabilistic representation of MultiLayer Perceptrons (MLPs) to improve the information-theoretic interpretability. Above all, we demonstrate that the activations being i.i.d. is not valid for all the hidden layers of MLPs, thus the existing mutual information estimators based on non-parametric inference methods, e.g., empirical distributions and Kernel Density Estimate (KDE), are invalid for measuring the information flow in MLPs. Moreover, we introduce explicit probabilistic explanations for MLPs: (i) we define the probability space (Omega_F, t, P_F) for a fully connected layer f and demonstrate the great effect of an activation function on the probability measure P_F ; (ii) we prove the entire architecture of MLPs as a Gibbs distribution P; and (iii) the back-propagation aims to optimize the sample space Omega_F of all the fully connected layers of MLPs for learning an optimal Gibbs distribution P* to express the statistical connection between the input and the label. Based on the probabilistic explanations for MLPs, we improve the information-theoretic interpretability of MLPs in three aspects: (i) the random variable of f is discrete and the corresponding entropy is finite; (ii) the information bottleneck theory cannot correctly explain the information flow in MLPs if we take into account the back-propagation; and (iii) we propose novel information-theoretic explanations for the generalization of MLPs. Finally, we demonstrate the proposed probabilistic representation and information-theoretic explanations for MLPs in a synthetic dataset and benchmark datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题