SwapMix：诊断和正规化视觉问题中视觉上下文的过度依赖

论文标题

SwapMix：诊断和正规化视觉问题中视觉上下文的过度依赖

SwapMix: Diagnosing and Regularizing the Over-Reliance on Visual Context in Visual Question Answering

论文作者

Gupta, Vipul, Li, Zhuowan, Kortylewski, Adam, Zhang, Chenyu, Li, Yingwei, Yuille, Alan

论文摘要

尽管视觉问题回答（VQA）的进展迅速，但以前的作品引起了人们对当前VQA模型鲁棒性的关注。在这项工作中，我们从新颖的角度研究了VQA模型的鲁棒性：视觉上下文。我们建议这些模型在视觉上下文（即图像中无关的对象）上过度汇总以进行预测。为了诊断模型对视觉上下文的依赖并衡量其鲁棒性，我们提出了一种简单而有效的扰动技术SwapMix。 SwapMix通过将无关的上下文对象与数据集中其他对象的特征交换为视觉上下文。使用SwapMix，我们能够为代表性VQA模型更改超过45％的问题的答案。此外，我们以完美的视线训练模型，并发现上下文过于依赖在很大程度上取决于视觉表示的质量。除诊断外，SWAPMIX还可以作为培训期间的数据增强策略应用，以使上下文过度依赖。通过交换上下文对象功能，可以有效地抑制对上下文的模型依赖。使用SWAPMIX研究了两个代表性的VQA模型：共同注意模型MCAN和大规模预处理的模型LXMERT。我们在流行的GQA数据集上进行的实验显示了SwapMix在诊断模型鲁棒性和正规依赖视觉环境方面的有效性。我们方法的代码可在https://github.com/vipulgupta1011/swapmix上获得

While Visual Question Answering (VQA) has progressed rapidly, previous works raise concerns about robustness of current VQA models. In this work, we study the robustness of VQA models from a novel perspective: visual context. We suggest that the models over-rely on the visual context, i.e., irrelevant objects in the image, to make predictions. To diagnose the model's reliance on visual context and measure their robustness, we propose a simple yet effective perturbation technique, SwapMix. SwapMix perturbs the visual context by swapping features of irrelevant context objects with features from other objects in the dataset. Using SwapMix we are able to change answers to more than 45 % of the questions for a representative VQA model. Additionally, we train the models with perfect sight and find that the context over-reliance highly depends on the quality of visual representations. In addition to diagnosing, SwapMix can also be applied as a data augmentation strategy during training in order to regularize the context over-reliance. By swapping the context object features, the model reliance on context can be suppressed effectively. Two representative VQA models are studied using SwapMix: a co-attention model MCAN and a large-scale pretrained model LXMERT. Our experiments on the popular GQA dataset show the effectiveness of SwapMix for both diagnosing model robustness and regularizing the over-reliance on visual context. The code for our method is available at https://github.com/vipulgupta1011/swapmix

下载PDF全文

下载文献需遵守相关版权规定

论文标题