论文标题
具有深度不确定性学习的在线广告系统中的探索
Exploration in Online Advertising Systems with Deep Uncertainty-Aware Learning
论文作者
论文摘要
现代在线广告系统不可避免地依赖于个性化方法,例如点击率(CTR)预测。 CTR预测的最新进展享有深度学习的丰富代表能力,并在大规模的工业应用中取得了巨大的成功。但是,这些方法可能缺乏探索。另一项先前的工作解决了通过上下文匪徒方法解决探索 - 开发权的权衡问题,由于难以扩展其灵活性,这些方法在行业中的研究较少。在本文中,我们提出了一种基于高斯过程的CTR模型的新型深层不确定性学习方法(双重)方法,该方法可以提供预测性的不确定性估计,同时保持深层神经网络的灵活性。双重可以在现有模型上轻松实现,并在实时系统中部署,并具有最小的额外计算开销。通过将双重的预测不确定性估计能力与众所周知的Bandit算法联系起来,我们进一步提出了基于双重的广告级策略,以提高长期公用事业,例如广告系统中的社会福利。几个公共数据集的实验结果证明了我们方法的有效性。值得注意的是,在阿里巴巴展示广告平台中部署的在线A/B测试显示,社会福利改善了8.2%,收入提高了8.0%。
Modern online advertising systems inevitably rely on personalization methods, such as click-through rate (CTR) prediction. Recent progress in CTR prediction enjoys the rich representation capabilities of deep learning and achieves great success in large-scale industrial applications. However, these methods can suffer from lack of exploration. Another line of prior work addresses the exploration-exploitation trade-off problem with contextual bandit methods, which are recently less studied in the industry due to the difficulty in extending their flexibility with deep models. In this paper, we propose a novel Deep Uncertainty-Aware Learning (DUAL) method to learn CTR models based on Gaussian processes, which can provide predictive uncertainty estimations while maintaining the flexibility of deep neural networks. DUAL can be easily implemented on existing models and deployed in real-time systems with minimal extra computational overhead. By linking the predictive uncertainty estimation ability of DUAL to well-known bandit algorithms, we further present DUAL-based Ad-ranking strategies to boost up long-term utilities such as the social welfare in advertising systems. Experimental results on several public datasets demonstrate the effectiveness of our methods. Remarkably, an online A/B test deployed in the Alibaba display advertising platform shows an 8.2% social welfare improvement and an 8.0% revenue lift.