审美视觉问题回答照片

论文标题

审美视觉问题回答照片

Aesthetic Visual Question Answering of Photographs

论文作者

Jin, Xin, Zhou, Wu, Zhou, Xinghui, Cui, Shuai, Zhang, Le, Lv, Jianwen, Zhao, Shu

论文摘要

图像的美学评估可以分为两种主要形式：数值评估和语言评估。照片的美学标题是已解决的审美语言评估的唯一任务。在本文中，我们提出了一项审美语言评估的新任务：图像的美学视觉问题和回答（AVQA）。如果我们提出图像美学问题，模型可以预测答案。我们使用\ textit {www.flickr.com}的图像。目标QA对由提出的美学属性分析算法产生。此外，我们引入了主观质量检查对，这些质量质量对从审美数字标签和来自大规模培训模型的情感分析转换。我们构建了回答数据集AESVQA的第一个美学视觉问题，其中包含72,168个高质量图像和324,756对美学问题。已经提出并证明了两种调整数据分布的方法，以提高现有模型的准确性。这是解决美学VQA任务并将主观性引入VQA任务的第一项工作。实验结果表明，我们的方法在这项新任务上的表现优于其他VQA模型。

Aesthetic assessment of images can be categorized into two main forms: numerical assessment and language assessment. Aesthetics caption of photographs is the only task of aesthetic language assessment that has been addressed. In this paper, we propose a new task of aesthetic language assessment: aesthetic visual question and answering (AVQA) of images. If we give a question of images aesthetics, model can predict the answer. We use images from \textit{www.flickr.com}. The objective QA pairs are generated by the proposed aesthetic attributes analysis algorithms. Moreover, we introduce subjective QA pairs that are converted from aesthetic numerical labels and sentiment analysis from large-scale pre-train models. We build the first aesthetic visual question answering dataset, AesVQA, that contains 72,168 high-quality images and 324,756 pairs of aesthetic questions. Two methods for adjusting the data distribution have been proposed and proved to improve the accuracy of existing models. This is the first work that both addresses the task of aesthetic VQA and introduces subjectiveness into VQA tasks. The experimental results reveal that our methods outperform other VQA models on this new task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题