通过多个辅助任务进行广告分配的加强学习中的列表表示形式

论文标题

通过多个辅助任务进行广告分配的加强学习中的列表表示形式

Learning List-wise Representation in Reinforcement Learning for Ads Allocation with Multiple Auxiliary Tasks

论文作者

Wang, Ze, Liao, Guogang, Shi, Xiaowen, Wu, Xiaoxu, Zhang, Chuheng, Wang, Yongkang, Wang, Xingxing, Wang, Dong

论文摘要

随着强化学习（RL）的最新流行率，在推荐平台（例如电子商务和新闻源网站）中利用RL进行ADS分配有很大的兴趣。为了获得更好的分配，将最新的基于RL的广告分配方法的输入从点单项目升级到列表项目的布置。但是，这也导致了国家行动对的高维空间，因此很难以良好的概括能力学习列表表示。这进一步阻碍了RL药物的探索，并导致样本效率差。为了解决这个问题，我们提出了一种基于RL的新方法，用于广告分配，该方法通过利用Meituan食品交付平台上的任务特定信号来学习更好的列表表示形式。具体而言，我们根据对ADS分配的先前领域知识分别提出了基于重建，预测和对比度学习的三个不同的辅助任务。我们在Meituan食品输送平台上进行了广泛的实验，以评估拟议的辅助任务的有效性。离线和在线实验结果都表明，与最先进的基线相比，提出的方法可以学习更好的列表表示形式，并获得更高的平台收入。

With the recent prevalence of reinforcement learning (RL), there have been tremendous interests in utilizing RL for ads allocation in recommendation platforms (e.g., e-commerce and news feed sites). To achieve better allocation, the input of recent RL-based ads allocation methods is upgraded from point-wise single item to list-wise item arrangement. However, this also results in a high-dimensional space of state-action pairs, making it difficult to learn list-wise representations with good generalization ability. This further hinders the exploration of RL agents and causes poor sample efficiency. To address this problem, we propose a novel RL-based approach for ads allocation which learns better list-wise representations by leveraging task-specific signals on Meituan food delivery platform. Specifically, we propose three different auxiliary tasks based on reconstruction, prediction, and contrastive learning respectively according to prior domain knowledge on ads allocation. We conduct extensive experiments on Meituan food delivery platform to evaluate the effectiveness of the proposed auxiliary tasks. Both offline and online experimental results show that the proposed method can learn better list-wise representations and achieve higher revenue for the platform compared to the state-of-the-art baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题