视觉问题答案的语义等效数据增强

论文标题

视觉问题答案的语义等效数据增强

Semantic Equivalent Adversarial Data Augmentation for Visual Question Answering

论文作者

Tang, Ruixue, Ma, Chao, Zhang, Wei Emma, Wu, Qi, Yang, Xiaokang

论文摘要

由于深度神经网络（DNN）的快速发展，视觉问题回答（VQA）取得了巨大的成功。另一方面，作为DNN的主要技巧之一，数据增强已在许多计算机视觉任务中广泛使用。但是，很少有研究VQA研究数据增强问题，并且由于其语义结构而无法直接应用于VQA的现有基于图像的增强方案（例如旋转和翻转） - $ \ langle映像，$ \ langle映像，问题，答案\ rangle $ tree都需要正确维护。例如，如果相关的图像旋转或翻转，则方向相关的问题解答（QA）对可能不正确。在本文中，我们没有直接操纵图像和问题，而是将生成的对抗性示例用于图像和问题作为增强数据。增强示例没有更改图像中显示的视觉属性，以及问题的\ textbf {语义}含义，$ \ langle映像的正确性，问题，答案，答案\ rangle $仍然维护。然后，我们使用对抗性学习来培训经典的VQA模型（BUTD），并使用我们的增强数据来培训。我们发现，与基线模型相比，我们不仅可以改善VQAV2上的整体性能，而且可以有效地承受对抗性攻击。源代码可在https://github.com/zaynmi/seada-vqa上找到。

Visual Question Answering (VQA) has achieved great success thanks to the fast development of deep neural networks (DNN). On the other hand, the data augmentation, as one of the major tricks for DNN, has been widely used in many computer vision tasks. However, there are few works studying the data augmentation problem for VQA and none of the existing image based augmentation schemes (such as rotation and flipping) can be directly applied to VQA due to its semantic structure -- an $\langle image, question, answer\rangle$ triplet needs to be maintained correctly. For example, a direction related Question-Answer (QA) pair may not be true if the associated image is rotated or flipped. In this paper, instead of directly manipulating images and questions, we use generated adversarial examples for both images and questions as the augmented data. The augmented examples do not change the visual properties presented in the image as well as the \textbf{semantic} meaning of the question, the correctness of the $\langle image, question, answer\rangle$ is thus still maintained. We then use adversarial learning to train a classic VQA model (BUTD) with our augmented data. We find that we not only improve the overall performance on VQAv2, but also can withstand adversarial attack effectively, compared to the baseline model. The source code is available at https://github.com/zaynmi/seada-vqa.

下载PDF全文

下载文献需遵守相关版权规定

论文标题