了解视觉问题回答中的知识差距：对差距识别和测试的影响

论文标题

了解视觉问题回答中的知识差距：对差距识别和测试的影响

Understanding Knowledge Gaps in Visual Question Answering: Implications for Gap Identification and Testing

论文作者

Bajaj, Goonmeet, Bandyopadhyay, Bortik, Schmidt, Daniel, Maneriker, Pranav, Myers, Christopher, Parthasarathy, Srinivasan

论文摘要

视觉问题回答（VQA）系统的任务是回答与呈现图像相对应的自然语言问题。传统的VQA数据集通常包含与对象，对象属性或一般场景问题的空间信息有关的问题。最近，研究人员认识到有必要提高此类数据集的平衡，以减少系统对记忆的语言特征和统计偏见的依赖，同时旨在增强视觉理解。但是，目前尚不清楚是否存在任何潜在模式来量化和解释这些失败。作为更好地量化VQA模型性能的理解的第一步，我们使用知识差距（KGS）的分类法来用一种或多种类型的KG标记问题。每个知识差距（KG）描述了达成解决方案所需的推理能力。在确定每个问题的kg之后，我们检查了每个公斤问题的分布中的偏差。然后，我们引入了一个有针对性的问题生成模型来减少这种偏斜，这使我们能够为图像生成新的问题。这些新问题可以添加到现有的VQA数据集中，以增加问题的多样性并减少偏斜。

Visual Question Answering (VQA) systems are tasked with answering natural language questions corresponding to a presented image. Traditional VQA datasets typically contain questions related to the spatial information of objects, object attributes, or general scene questions. Recently, researchers have recognized the need to improve the balance of such datasets to reduce the system's dependency on memorized linguistic features and statistical biases, while aiming for enhanced visual understanding. However, it is unclear whether any latent patterns exist to quantify and explain these failures. As an initial step towards better quantifying our understanding of the performance of VQA models, we use a taxonomy of Knowledge Gaps (KGs) to tag questions with one or more types of KGs. Each Knowledge Gap (KG) describes the reasoning abilities needed to arrive at a resolution. After identifying KGs for each question, we examine the skew in the distribution of questions for each KG. We then introduce a targeted question generation model to reduce this skew, which allows us to generate new types of questions for an image. These new questions can be added to existing VQA datasets to increase the diversity of questions and reduce the skew.

下载PDF全文

下载文献需遵守相关版权规定

论文标题