viecap4h-vlsp 2021：对象关系变压器的objectaoa增强性能，关注越南图像字幕的关注

论文标题

viecap4h-vlsp 2021：对象关系变压器的objectaoa增强性能，关注越南图像字幕的关注

VieCap4H-VLSP 2021: ObjectAoA-Enhancing performance of Object Relation Transformer with Attention on Attention for Vietnamese image captioning

论文作者

Nguyen, Nghia Hieu, Vo, Duong T. D., Ha, Minh-Quan

论文摘要

图像字幕目前是一项具有挑战性的任务，需要能够了解视觉信息并使用人类语言来描述图像中的此视觉信息。在本文中，我们提出了一种有效的方法，通过通过关注机制的关注扩展对象关系变形金刚结构来提高基于变压器方法的图像理解能力。 VIECAP4H数据集上的实验表明，我们提出的方法在VLSP执行的图像字幕上的公共测试和私人测试上大大优于其原始结构。

Image captioning is currently a challenging task that requires the ability to both understand visual information and use human language to describe this visual information in the image. In this paper, we propose an efficient way to improve the image understanding ability of transformer-based method by extending Object Relation Transformer architecture with Attention on Attention mechanism. Experiments on the VieCap4H dataset show that our proposed method significantly outperforms its original structure on both the public test and private test of the Image Captioning shared task held by VLSP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题