Bootstrap潜在表示多模式建议

论文标题

Bootstrap潜在表示多模式建议

Bootstrap Latent Representations for Multi-modal Recommendation

论文作者

Zhou, Xin, Zhou, Hongyu, Liu, Yong, Zeng, Zhiwei, Miao, Chunyan, Wang, Pengwei, You, Yuan, Jiang, Feijun

论文摘要

本文研究了多模式推荐问题，其中利用项目多模式信息（例如图像和文本描述）来提高建议准确性。除了用户项目交互图外，现有的最新方法通常使用辅助图（例如用户使用者或项目项目关系图）来增强用户和/或项目的学习表示。这些表示通常使用图形卷积网络在辅助图上传播和汇总，这些卷积网络在计算和内存中的昂贵，尤其是对于大图。此外，现有的多模式推荐方法通常利用贝叶斯个性化排名（BPR）损失的随机采样负面示例，以指导用户/项目表示的学习，这增加了大图上的计算成本，并可能将嘈杂的监督信号带入培训过程中。为了解决上述问题，我们提出了一种新型的自我监督的多模式推荐模型，称为BM3，该模型既不需要辅助图的增强，也不需要负样本。具体而言，BM3首先引导从用户和项目表示的潜在对比视图，并具有简单的辍学增强。然后，它共同优化了三个多模式目标，以通过重建用户项目交互图并在模式内和模式内观点下对用户项目交互图并对齐方式特征来学习用户和项目的表示。 BM3既减轻了与负面示例对比的需求，又减轻了来自其他目标网络的复杂图扩大以进行对比视图生成。我们显示，BM3的表现优于三个数据集上的先前建议模型，其节点数量从20K到200k不等，同时减少了2-9倍的培训时间。我们的代码可在https://github.com/enoche/bm3上找到。

This paper studies the multi-modal recommendation problem, where the item multi-modality information (e.g., images and textual descriptions) is exploited to improve the recommendation accuracy. Besides the user-item interaction graph, existing state-of-the-art methods usually use auxiliary graphs (e.g., user-user or item-item relation graph) to augment the learned representations of users and/or items. These representations are often propagated and aggregated on auxiliary graphs using graph convolutional networks, which can be prohibitively expensive in computation and memory, especially for large graphs. Moreover, existing multi-modal recommendation methods usually leverage randomly sampled negative examples in Bayesian Personalized Ranking (BPR) loss to guide the learning of user/item representations, which increases the computational cost on large graphs and may also bring noisy supervision signals into the training process. To tackle the above issues, we propose a novel self-supervised multi-modal recommendation model, dubbed BM3, which requires neither augmentations from auxiliary graphs nor negative samples. Specifically, BM3 first bootstraps latent contrastive views from the representations of users and items with a simple dropout augmentation. It then jointly optimizes three multi-modal objectives to learn the representations of users and items by reconstructing the user-item interaction graph and aligning modality features under both inter- and intra-modality perspectives. BM3 alleviates both the need for contrasting with negative examples and the complex graph augmentation from an additional target network for contrastive view generation. We show BM3 outperforms prior recommendation models on three datasets with number of nodes ranging from 20K to 200K, while achieving a 2-9X reduction in training time. Our code is available at https://github.com/enoche/BM3.

下载PDF全文

下载文献需遵守相关版权规定

论文标题