论文标题

ESAM:通过非播放项目适应歧视域,以改善长尾性能

ESAM: Discriminative Domain Adaptation with Non-Displayed Items to Improve Long-Tail Performance

论文作者

Chen, Zhihong, Xiao, Rong, Li, Chenliang, Ye, Gangfeng, Sun, Haochuan, Deng, Hongbo

论文摘要

大多数排名模型仅通过显示的项目(大多数是热门项目)进行培训,但是它们可在整个空间中检索由显示的项目和非播放项目组成(大多数是长尾项目)。由于样本选择偏见,长尾项目缺乏足够的记录来学习良好的功能表示形式,即数据稀疏和冷启动问题。所得的分布分布差异和未播放的项目之间的差异会导致长尾性能不佳。为此,我们提出了一个整个空间适应模型(ESAM),以从域适应(DA)的角度解决此问题。 ESAM将显示和非放置项目分别作为源和目标域。具体而言,我们设计了属性相关对准,该属性相关对准考虑了该项目的高级属性之间的相关性以实现分布对齐。此外,我们介绍了两种有效的正则化策略,即\ textit {中心聚类}和\ textit {自我训练}以改善DA过程。 ESAM在不需要任何辅助信息和辅助域的情况下,将知识从显示的项目转移到非放置项目,以减轻分配不一致。对两个公共数据集和从淘宝收集的大规模工业数据集进行的实验表明,ESAM实现了最新的性能,尤其是在长尾空间中。此外,我们将ESAM部署到Taobao搜索引擎,从而显着改善了在线性能。该代码可在\ url {https://github.com/a-bone1/esam.git}中获得

Most of ranking models are trained only with displayed items (most are hot items), but they are utilized to retrieve items in the entire space which consists of both displayed and non-displayed items (most are long-tail items). Due to the sample selection bias, the long-tail items lack sufficient records to learn good feature representations, i.e. data sparsity and cold start problems. The resultant distribution discrepancy between displayed and non-displayed items would cause poor long-tail performance. To this end, we propose an entire space adaptation model (ESAM) to address this problem from the perspective of domain adaptation (DA). ESAM regards displayed and non-displayed items as source and target domains respectively. Specifically, we design the attribute correlation alignment that considers the correlation between high-level attributes of the item to achieve distribution alignment. Furthermore, we introduce two effective regularization strategies, i.e. \textit{center-wise clustering} and \textit{self-training} to improve DA process. Without requiring any auxiliary information and auxiliary domains, ESAM transfers the knowledge from displayed items to non-displayed items for alleviating the distribution inconsistency. Experiments on two public datasets and a large-scale industrial dataset collected from Taobao demonstrate that ESAM achieves state-of-the-art performance, especially in the long-tail space. Besides, we deploy ESAM to the Taobao search engine, leading to significant improvement on online performance. The code is available at \url{https://github.com/A-bone1/ESAM.git}

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源