灾难响应系统的连续VQA

论文标题

灾难响应系统的连续VQA

Continual VQA for Disaster Response Systems

论文作者

Kane, Aditya, Manushree, V, Khose, Sahil

论文摘要

视觉问题回答（VQA）是一项多模式的任务，涉及从输入图像中回答问题，从语义上理解图像的内容并以自然语言回答。由于VQA系统回答的问题范围，使用VQA进行灾难管理是一项重要的研究。但是，主要的挑战是评估受影响地区的标签产生的延迟。为了解决这个问题，我们部署了预先训练的剪辑模型，该模型在视觉图像对中进行了训练。但是，我们从经验上看到该模型的零拍摄性能差。因此，我们相反，我们使用此模型的文本和图像的预训练的嵌入方式进行我们的监督培训，并超过Floodnet数据集上的先前最新结果。我们将其扩展到持续的设置，这是一种更现实的情况。我们解决了使用各种经验重播方法的灾难性遗忘的问题。我们的培训运行可在以下网址提供：https：//wandb.ai/compyle/continual_vqa_final。我们的代码可在https://github.com/adityakane2001/continual_vqa上找到。

Visual Question Answering (VQA) is a multi-modal task that involves answering questions from an input image, semantically understanding the contents of the image and answering it in natural language. Using VQA for disaster management is an important line of research due to the scope of problems that are answered by the VQA system. However, the main challenge is the delay caused by the generation of labels in the assessment of the affected areas. To tackle this, we deployed pre-trained CLIP model, which is trained on visual-image pairs. however, we empirically see that the model has poor zero-shot performance. Thus, we instead use pre-trained embeddings of text and image from this model for our supervised training and surpass previous state-of-the-art results on the FloodNet dataset. We expand this to a continual setting, which is a more real-life scenario. We tackle the problem of catastrophic forgetting using various experience replay methods. Our training runs are available at: https://wandb.ai/compyle/continual_vqa_final. Our code is available at https://github.com/AdityaKane2001/continual_vqa.

下载PDF全文

下载文献需遵守相关版权规定

论文标题