基于神经元的神经网络的解释牺牲了完整性和解释性

论文标题

基于神经元的神经网络的解释牺牲了完整性和解释性

Neuron-based explanations of neural networks sacrifice completeness and interpretability

论文作者

Dey, Nolan, Taylor, Eric, Wong, Alexander, Tripp, Bryan, Taylor, Graham W.

论文摘要

神经网络（NNS）的高质量解释应具有两个关键特性。完整性可确保它们准确地反映网络的功能和解释性，使其对人类来说是可以理解的。许多现有方法提供了网络中各个神经元的解释。在这项工作中，我们提供了证据表明，与激活原理成分相比，基于神经元的解释方法对Alexnet进行了预测的基于神经元的解释方法。神经元是Alexnet嵌入的差基础，因为它们不考虑这些表示形式的分布性质。通过检查完整性的两种定量测量并进行用户研究以衡量可解释性，我们显示最重要的主要组件提供了比最重要的神经元更完整，更容易解释的解释。可以通过研究相对较少的高变化PC来解释许多激活方差，而不是研究每个神经元。这些主要成分也会强烈影响网络功能，并且比神经元更容易解释。我们的发现表明，诸如Alexnet等网络的解释方法应避免使用神经元作为嵌入的基础，而是选择一个基础，例如主要组件，该基础成分是网络内部表示的高维质和分布式性质。交互式演示和代码可在https://ndey96.github.io/neuron-explanations-sarcifice中获得。

High quality explanations of neural networks (NNs) should exhibit two key properties. Completeness ensures that they accurately reflect a network's function and interpretability makes them understandable to humans. Many existing methods provide explanations of individual neurons within a network. In this work we provide evidence that for AlexNet pretrained on ImageNet, neuron-based explanation methods sacrifice both completeness and interpretability compared to activation principal components. Neurons are a poor basis for AlexNet embeddings because they don't account for the distributed nature of these representations. By examining two quantitative measures of completeness and conducting a user study to measure interpretability, we show the most important principal components provide more complete and interpretable explanations than the most important neurons. Much of the activation variance may be explained by examining relatively few high-variance PCs, as opposed to studying every neuron. These principal components also strongly affect network function, and are significantly more interpretable than neurons. Our findings suggest that explanation methods for networks like AlexNet should avoid using neurons as a basis for embeddings and instead choose a basis, such as principal components, which accounts for the high dimensional and distributed nature of a network's internal representations. Interactive demo and code available at https://ndey96.github.io/neuron-explanations-sacrifice.

下载PDF全文

下载文献需遵守相关版权规定

论文标题