论文标题
VLSNR:视觉语言协调时间序列意识新闻推荐
VLSNR:Vision-Linguistics Coordination Time Sequence-aware News Recommendation
论文作者
论文摘要
新闻表示和面向用户的建模对于新闻推荐都是必不可少的。大多数现有方法基于文本信息,但忽略了视觉信息和用户的动态兴趣。但是,与仅文本内容相比,多模式语义有益于增强用户的时间和持久兴趣的理解。在我们的工作中,我们提出了一个视觉语言学协调时间序列新闻建议。首先,将验证的多模式编码器应用于嵌入图像和文本中,以中的特征空间。然后,自我发项网络用于学习时间顺序。此外,提出了一个注意力网络,以适当地对用户偏好进行建模。最后,嵌入了点击历史记录和用户表示形式,以计算候选新闻的排名分数。此外,我们还构建了一个大型多模式新闻建议数据集V-Mind。实验结果表明,我们的模型在我们独立构建的数据集上的表现优于基准并实现SOTA。
News representation and user-oriented modeling are both essential for news recommendation. Most existing methods are based on textual information but ignore the visual information and users' dynamic interests. However, compared to textual only content, multimodal semantics is beneficial for enhancing the comprehension of users' temporal and long-lasting interests. In our work, we propose a vision-linguistics coordinate time sequence news recommendation. Firstly, a pretrained multimodal encoder is applied to embed images and texts into the same feature space. Then the self-attention network is used to learn the chronological sequence. Additionally, an attentional GRU network is proposed to model user preference in terms of time adequately. Finally, the click history and user representation are embedded to calculate the ranking scores for candidate news. Furthermore, we also construct a large scale multimodal news recommendation dataset V-MIND. Experimental results show that our model outperforms baselines and achieves SOTA on our independently constructed dataset.