通过视觉语言表示为广告创意设计推荐主题

论文标题

通过视觉语言表示为广告创意设计推荐主题

Recommending Themes for Ad Creative Design via Visual-Linguistic Representations

论文作者

Zhou, Yichao, Mishra, Shaunak, Verma, Manisha, Bhamidipati, Narayan, Wang, Wei

论文摘要

在线广告行业需要多年生植物来刷新广告广告创意，即用于吸引在线用户进入品牌的图像和文本。需要这种刷新来减少在线用户中广告疲劳的可能性，并纳入相关产品类别中其他成功广告系列的见解。鉴于品牌，为新广告提出主题是创意战略家的艰辛和耗时的过程。策略师通常会从用于过去的广告活动的图像和文本以及对品牌的世界知识中汲取灵感。要通过过去的广告系列中的多式模式信息来源自动推断AD主题，我们为广告创意战略家提出了一个主题（键形）推荐系统。主题推荐仪是基于视觉问题回答（VQA）任务的汇总结果，该任务摄入以下内容：（i）广告图像，（ii）与广告中的广告以及Wikipedia页面相关的文本以及ADS中的品牌以及（iii）围绕AD的问题。我们利用基于变压器的跨模式编码来训练VQA任务的视觉语言表示。我们在分类和排名方面研究了VQA任务的两个公式；通过公共数据集的实验，我们表明跨模式表示形式可显着提高分类准确性和排名Precision-Recall指标。与单独的图像和文本表示相比，跨模式表示的性能更好。此外，多模式信息的使用显示了仅使用文本或视觉信息而不是使用文本或视觉信息。

There is a perennial need in the online advertising industry to refresh ad creatives, i.e., images and text used for enticing online users towards a brand. Such refreshes are required to reduce the likelihood of ad fatigue among online users, and to incorporate insights from other successful campaigns in related product categories. Given a brand, to come up with themes for a new ad is a painstaking and time consuming process for creative strategists. Strategists typically draw inspiration from the images and text used for past ad campaigns, as well as world knowledge on the brands. To automatically infer ad themes via such multimodal sources of information in past ad campaigns, we propose a theme (keyphrase) recommender system for ad creative strategists. The theme recommender is based on aggregating results from a visual question answering (VQA) task, which ingests the following: (i) ad images, (ii) text associated with the ads as well as Wikipedia pages on the brands in the ads, and (iii) questions around the ad. We leverage transformer based cross-modality encoders to train visual-linguistic representations for our VQA task. We study two formulations for the VQA task along the lines of classification and ranking; via experiments on a public dataset, we show that cross-modal representations lead to significantly better classification accuracy and ranking precision-recall metrics. Cross-modal representations show better performance compared to separate image and text representations. In addition, the use of multimodal information shows a significant lift over using only textual or visual information.

下载PDF全文

下载文献需遵守相关版权规定

论文标题