重建引导的注意改善神经网络的鲁棒性和形状处理

论文标题

重建引导的注意改善神经网络的鲁棒性和形状处理

Reconstruction-guided attention improves the robustness and shape processing of neural networks

论文作者

Ahn, Seoyoung, Adeli, Hossein, Zelinsky, Gregory J.

论文摘要

许多视觉现象表明，人类使用自上而下的生成或重建过程来创建视觉感知（例如，图像，对象完成，pareidolia），但对重建在强大的对象识别中的作用却一无所知。我们构建了一个迭代编码器网络，该网络生成对象重建，并将其用作自上而下的注意力反馈，以将最相关的空间和功能信息路由前进的对象识别过程。我们使用具有挑战性的分布数字识别数据集MNIST-C测试了该模型，其中将15种不同类型的转换和损坏应用于手写数字图像。我们的模型对各种图像扰动表现出强烈的概括性能，平均所有其他模型（包括前馈CNN和受对抗训练的网络）的表现平均表现。我们的模型对于模糊，噪音和遮挡腐败特别强大，在这种情况下，形状感知起着重要作用。消融研究进一步揭示了在强大的对象识别中基于空间和特征的关注的两个互补作用，前者在很大程度上与注意文献中的空间掩盖益处一致（重建是蒙版），后者主要促成模型的推理速度（即，可以促进模型的推理速度（即达到一定的置信度），可以通过一定的置信度进行了降低对象。我们还观察到该模型有时会从噪声中幻觉，从而导致高度可解释的人类误差。我们的研究表明，基于重建的反馈建模赋予AI系统具有强大的注意机制，这可以帮助我们理解产生感知在人类视觉处理中的作用。

Many visual phenomena suggest that humans use top-down generative or reconstructive processes to create visual percepts (e.g., imagery, object completion, pareidolia), but little is known about the role reconstruction plays in robust object recognition. We built an iterative encoder-decoder network that generates an object reconstruction and used it as top-down attentional feedback to route the most relevant spatial and feature information to feed-forward object recognition processes. We tested this model using the challenging out-of-distribution digit recognition dataset, MNIST-C, where 15 different types of transformation and corruption are applied to handwritten digit images. Our model showed strong generalization performance against various image perturbations, on average outperforming all other models including feedforward CNNs and adversarially trained networks. Our model is particularly robust to blur, noise, and occlusion corruptions, where shape perception plays an important role. Ablation studies further reveal two complementary roles of spatial and feature-based attention in robust object recognition, with the former largely consistent with spatial masking benefits in the attention literature (the reconstruction serves as a mask) and the latter mainly contributing to the model's inference speed (i.e., number of time steps to reach a certain confidence threshold) by reducing the space of possible object hypotheses. We also observed that the model sometimes hallucinates a non-existing pattern out of noise, leading to highly interpretable human-like errors. Our study shows that modeling reconstruction-based feedback endows AI systems with a powerful attention mechanism, which can help us understand the role of generating perception in human visual processing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题