学习嵌入分类特征而不嵌入表供推荐

论文标题

学习嵌入分类特征而不嵌入表供推荐

Learning to Embed Categorical Features without Embedding Tables for Recommendation

论文作者

Kang, Wang-Cheng, Cheng, Derek Zhiyuan, Yao, Tiansheng, Yi, Xinyang, Chen, Ting, Hong, Lichan, Chi, Ed H.

论文摘要

嵌入分类特征（例如用户/项目ID）的学习是各种推荐模型的核心，包括矩阵分解和神经协作过滤。标准方法创建一个嵌入式表，其中每一行代表每个唯一特征值的专用嵌入向量。但是，此方法无法有效地处理高心电图功能和看不见的特征值（例如新视频ID），这些功能在现实世界推荐系统中很普遍。在本文中，我们提出了一个替代的嵌入框架深哈希嵌入（DHE），用深层嵌入网络代替嵌入表，以即时计算嵌入嵌入。 DHE首先将特征值编码为具有多个哈希功能和转换的唯一标识符向量，然后应用DNN将标识符向量转换为嵌入。编码模块是确定性的，不可行的且无存储的，而嵌入式网络则在训练时间更新以学习嵌入生成。经验结果表明，DHE与标准的单热嵌入相当的AUC具有较小的模型尺寸。我们的工作阐明了基于DNN的替代嵌入方案的设计，而无需使用嵌入式桌子查找。

Embedding learning of categorical features (e.g. user/item IDs) is at the core of various recommendation models including matrix factorization and neural collaborative filtering. The standard approach creates an embedding table where each row represents a dedicated embedding vector for every unique feature value. However, this method fails to efficiently handle high-cardinality features and unseen feature values (e.g. new video ID) that are prevalent in real-world recommendation systems. In this paper, we propose an alternative embedding framework Deep Hash Embedding (DHE), replacing embedding tables by a deep embedding network to compute embeddings on the fly. DHE first encodes the feature value to a unique identifier vector with multiple hashing functions and transformations, and then applies a DNN to convert the identifier vector to an embedding. The encoding module is deterministic, non-learnable, and free of storage, while the embedding network is updated during the training time to learn embedding generation. Empirical results show that DHE achieves comparable AUC against the standard one-hot full embedding, with smaller model sizes. Our work sheds light on the design of DNN-based alternative embedding schemes for categorical features without using embedding table lookup.

下载PDF全文

下载文献需遵守相关版权规定

论文标题