必vQA：多语言场景-Text VQA

论文标题

必vQA：多语言场景-Text VQA

MUST-VQA: MUltilingual Scene-text VQA

论文作者

Vivoli, Emanuele, Biten, Ali Furkan, Mafla, Andres, Karatzas, Dimosthenis, Gomez, Lluis

论文摘要

在本文中，我们提出了一个多语言场景文本视觉问题的框架，以零拍的方式处理新语言。具体来说，我们考虑场景文本视觉质量回答（STVQA）的任务，其中可以用不同的语言提出问题，并且不一定与场景文本语言保持一致。因此，我们首先引入了自然的步骤，朝着更广泛的版本的STVQA：RUST-VQA介绍。为此，我们在约束设置中讨论了两个评估方案，即IID和零射击，我们证明了模型可以在零拍设置的标准杆上执行。我们进一步提供了广泛的实验，并显示了将多语言模型调整为STVQA任务的有效性。

In this paper, we present a framework for Multilingual Scene Text Visual Question Answering that deals with new languages in a zero-shot fashion. Specifically, we consider the task of Scene Text Visual Question Answering (STVQA) in which the question can be asked in different languages and it is not necessarily aligned to the scene text language. Thus, we first introduce a natural step towards a more generalized version of STVQA: MUST-VQA. Accounting for this, we discuss two evaluation scenarios in the constrained setting, namely IID and zero-shot and we demonstrate that the models can perform on a par on a zero-shot setting. We further provide extensive experimentation and show the effectiveness of adapting multilingual language models into STVQA tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题