使用基于GRU的注意机制生成图像标题的深神经框架

论文标题

使用基于GRU的注意机制生成图像标题的深神经框架

A Deep Neural Framework for Image Caption Generation Using GRU-Based Attention Mechanism

论文作者

Khan, Rashid, Islam, M Shujah, Kanwal, Khadija, Iqbal, Mansoor, Hossain, Md. Imran, Ye, Zhongfu

论文摘要

图像字幕是一个快速增长的计算机视觉和自然语言处理的研究领域，涉及为图像创建文本说明。这项研究旨在开发一种系统，该系统使用预训练的卷积神经网络（CNN）从图像中提取特征，将特征与注意机制集成在一起，并使用复发性神经网络（RNN）创建字幕。为了将图像作为图形属性编码为特征向量，我们采用了多个预先训练的卷积神经网络。随后，选择了称为GRU的语言模型作为构造描述性句子的解码器。为了提高性能，我们将Bahdanau注意模型与GRU合并，以允许学习专注于图像的特定部分。在MSCOCO数据集上，实验结果针对最先进的方法实现了竞争性能。

Image captioning is a fast-growing research field of computer vision and natural language processing that involves creating text explanations for images. This study aims to develop a system that uses a pre-trained convolutional neural network (CNN) to extract features from an image, integrates the features with an attention mechanism, and creates captions using a recurrent neural network (RNN). To encode an image into a feature vector as graphical attributes, we employed multiple pre-trained convolutional neural networks. Following that, a language model known as GRU is chosen as the decoder to construct the descriptive sentence. In order to increase performance, we merge the Bahdanau attention model with GRU to allow learning to be focused on a specific portion of the image. On the MSCOCO dataset, the experimental results achieve competitive performance against state-of-the-art approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题