论文标题
B-COS网络:对齐是我们需要解释性的
B-cos Networks: Alignment is All We Need for Interpretability
论文作者
论文摘要
我们提出了一个新的方向,可以通过在训练过程中促进输入对准来提高深神经网络(DNN)的解释性。为此,我们建议通过B-COS变换代替DNN中的线性变换。如我们所示,此类变换的序列(网络)诱导了单个线性变换,该变换忠实地总结了完整的模型计算。此外,B-COS变换在优化过程中引入了重量的对齐压力。结果,这些诱导的线性变换变得高度可解释,并与任务相关的功能保持一致。重要的是,B-COS变换旨在与现有体系结构兼容,我们表明它可以轻松地集成到诸如VGG,Resnets,InceptionNet和Densenets之类的常见模型中,同时在Imagenet上保持相似的性能。由此产生的解释具有很高的视觉质量,并且在定量指标下表现出色。代码可在https://www.github.com/moboehle/b-cos中找到。
We present a new direction for increasing the interpretability of deep neural networks (DNNs) by promoting weight-input alignment during training. For this, we propose to replace the linear transforms in DNNs by our B-cos transform. As we show, a sequence (network) of such transforms induces a single linear transform that faithfully summarises the full model computations. Moreover, the B-cos transform introduces alignment pressure on the weights during optimisation. As a result, those induced linear transforms become highly interpretable and align with task-relevant features. Importantly, the B-cos transform is designed to be compatible with existing architectures and we show that it can easily be integrated into common models such as VGGs, ResNets, InceptionNets, and DenseNets, whilst maintaining similar performance on ImageNet. The resulting explanations are of high visual quality and perform well under quantitative metrics for interpretability. Code available at https://www.github.com/moboehle/B-cos.