视觉问题回答的强大解释

论文标题

视觉问题回答的强大解释

Robust Explanations for Visual Question Answering

论文作者

Patro, Badri N., Pate, Shivansh, Namboodiri, Vinay P.

论文摘要

在本文中，我们提出了一种获得可视化问题答案（VQA）的强大解释的方法，该解释与答案良好相关。我们的模型通过提供视觉和文本说明来解释通过VQA模型获得的答案。我们解决的主要挑战是i）通过当前方法获得的答案和文本解释不是很好的关联，ii）当前的视觉解释方法并不关注正确的位置来解释答案。我们通过使用协作相关模块来应对这两个挑战，该模块可以确保即使我们不训练基于噪声的攻击，增强的相关性也可以确保可以生成正确的解释和答案。我们进一步表明，这也有助于改善产生的视觉和文本解释。可以将相关模块的使用视为一种可靠的方法，以验证答案和解释是否连贯。我们使用VQA-X数据集评估了此模型。我们观察到，所提出的方法产生了支持决定的更好的文本和视觉上的理由。我们使用相应的视觉和文本说明展示了模型对基于噪声的扰动攻击的鲁棒性。显示了详细的经验分析。在这里，我们为模型\ url {https://github.com/delta-lab-iitk/ccm-wacv}提供源代码链接。

In this paper, we propose a method to obtain robust explanations for visual question answering(VQA) that correlate well with the answers. Our model explains the answers obtained through a VQA model by providing visual and textual explanations. The main challenges that we address are i) Answers and textual explanations obtained by current methods are not well correlated and ii) Current methods for visual explanation do not focus on the right location for explaining the answer. We address both these challenges by using a collaborative correlated module which ensures that even if we do not train for noise based attacks, the enhanced correlation ensures that the right explanation and answer can be generated. We further show that this also aids in improving the generated visual and textual explanations. The use of the correlated module can be thought of as a robust method to verify if the answer and explanations are coherent. We evaluate this model using VQA-X dataset. We observe that the proposed method yields better textual and visual justification that supports the decision. We showcase the robustness of the model against a noise-based perturbation attack using corresponding visual and textual explanations. A detailed empirical analysis is shown. Here we provide source code link for our model \url{https://github.com/DelTA-Lab-IITK/CCM-WACV}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题