论文标题
GUIM-电子商务中代表性混合的一般用户和项目嵌入
GUIM -- General User and Item Embedding with Mixture of Representation in E-commerce
论文作者
论文摘要
我们的目标是为阿里巴巴业务中的每个用户和每个产品项目建立一般代表性(嵌入),包括淘宝和Tmall,这是世界上最大的电子商务网站之一。用户和项目的表示在各种下游应用程序中都起着至关重要的作用,包括建议系统,搜索,营销,需求预测等。受到自然语言处理(NLP)域的BERT模型的启发,我们提出了GUIM(一般用户项目与代表的混合物)模型嵌入,以实现大量,结构化的,多模式的数据,包括数亿用户和项目之间的交互。我们利用表示(MOR)的混合物作为一种新的表示形式,以模拟每个用户的各种兴趣。此外,我们使用对比度学习中的Infonce,以避免由于众多词汇的大小(令牌)词汇量,因此避免了棘手的计算成本。最后,我们建议一组代表性的下游任务,以作为标准基准,以评估学到的用户和/或项目嵌入的质量,类似于NLP域中的胶合基准。我们在这些下游任务中的实验结果清楚地表明了从GUIM模型中学到的嵌入的比较价值。
Our goal is to build general representation (embedding) for each user and each product item across Alibaba's businesses, including Taobao and Tmall which are among the world's biggest e-commerce websites. The representation of users and items has been playing a critical role in various downstream applications, including recommendation system, search, marketing, demand forecasting and so on. Inspired from the BERT model in natural language processing (NLP) domain, we propose a GUIM (General User Item embedding with Mixture of representation) model to achieve the goal with massive, structured, multi-modal data including the interactions among hundreds of millions of users and items. We utilize mixture of representation (MoR) as a novel representation form to model the diverse interests of each user. In addition, we use the InfoNCE from contrastive learning to avoid intractable computational costs due to the numerous size of item (token) vocabulary. Finally, we propose a set of representative downstream tasks to serve as a standard benchmark to evaluate the quality of the learned user and/or item embeddings, analogous to the GLUE benchmark in NLP domain. Our experimental results in these downstream tasks clearly show the comparative value of embeddings learned from our GUIM model.