论文标题
学习嵌入分类特征而不嵌入表供推荐
Learning to Embed Categorical Features without Embedding Tables for Recommendation
论文作者
论文摘要
嵌入分类特征(例如用户/项目ID)的学习是各种推荐模型的核心,包括矩阵分解和神经协作过滤。标准方法创建一个嵌入式表,其中每一行代表每个唯一特征值的专用嵌入向量。但是,此方法无法有效地处理高心电图功能和看不见的特征值(例如新视频ID),这些功能在现实世界推荐系统中很普遍。在本文中,我们提出了一个替代的嵌入框架深哈希嵌入(DHE),用深层嵌入网络代替嵌入表,以即时计算嵌入嵌入。 DHE首先将特征值编码为具有多个哈希功能和转换的唯一标识符向量,然后应用DNN将标识符向量转换为嵌入。编码模块是确定性的,不可行的且无存储的,而嵌入式网络则在训练时间更新以学习嵌入生成。经验结果表明,DHE与标准的单热嵌入相当的AUC具有较小的模型尺寸。我们的工作阐明了基于DNN的替代嵌入方案的设计,而无需使用嵌入式桌子查找。
Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. However, this method fails to efficiently handle high-cardinality features and unseen feature values (e.g. new video ID) that are prevalent in real-world recommendation systems. In this paper, we propose an alternative embedding framework Deep Hash Embedding (DHE), replacing embedding tables by a deep embedding network to compute embeddings on the fly. DHE first encodes the feature value to a unique identifier vector with multiple hashing functions and transformations, and then applies a DNN to convert the identifier vector to an embedding. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation. Empirical results show that DHE achieves comparable AUC against the standard one-hot full embedding, with smaller model sizes. Our work sheds light on the design of DNN-based alternative embedding schemes for categorical features without using embedding table lookup.