论文标题
强大的视觉问题的生成偏见回答
Generative Bias for Robust Visual Question Answering
论文作者
论文摘要
众所周知,视觉问题回答(VQA)的任务受到了利用数据集中偏见以做出最终预测的VQA模型的问题。已经提出了以前的各种基于合奏的脱数方法,其中有目的地训练了一个额外的模型以偏见以训练强大的目标模型。但是,这些方法仅从训练数据的标签统计信息或单局分支的标签统计数据中计算出模型的偏差。在这项工作中,为了更好地了解目标VQA模型的偏见,我们提出了一种生成方法,可以直接从目标模型(称为GenB)训练偏置模型。特别是,GenB采用生成网络来通过对抗性目标和知识蒸馏的结合来学习目标模型中的偏见。然后,我们将目标模型以GENB作为偏置模型为单位,并通过广泛的实验显示了我们方法对包括VQA-CP2,VQA-CP1,VQA-CP1,GQA-OOD和VQA-CE在内的各种VQA偏置数据集的影响,并显示出LXMERT在VQA-CP2上对LXMERT架构的最新结果。
The task of Visual Question Answering (VQA) is known to be plagued by the issue of VQA models exploiting biases within the dataset to make its final prediction. Various previous ensemble based debiasing methods have been proposed where an additional model is purposefully trained to be biased in order to train a robust target model. However, these methods compute the bias for a model simply from the label statistics of the training data or from single modal branches. In this work, in order to better learn the bias a target VQA model suffers from, we propose a generative method to train the bias model directly from the target model, called GenB. In particular, GenB employs a generative network to learn the bias in the target model through a combination of the adversarial objective and knowledge distillation. We then debias our target model with GenB as a bias model, and show through extensive experiments the effects of our method on various VQA bias datasets including VQA-CP2, VQA-CP1, GQA-OOD, and VQA-CE, and show state-of-the-art results with the LXMERT architecture on VQA-CP2.