论文标题
基于文本的深入强化学习框架,用于交互式建议
A Text-based Deep Reinforcement Learning Framework for Interactive Recommendation
论文作者
论文摘要
由于其从动态互动和长期绩效计划中学习的性质,最近在交互式推荐系统(IRSS)中引起了很多关注。 IRS通常面临大型离散的动作空间问题,这使得现有的大多数基于RL的建议方法效率低下。此外,数据稀疏性是大多数IRS面临的另一个具有挑战性的问题。尽管诸如评论和描述之类的文本信息对稀疏性不太敏感,但现有的基于RL的建议方法要么忽略或不适合合并文本信息。为了解决这两个问题,在本文中,我们为IRS提出了一个基于文本的深层确定性政策梯度框架(TDDPG-REC)。具体来说,我们利用文本信息将项目和用户映射到功能空间中,从而大大减轻了稀疏问题。此外,我们设计了一种有效的方法来构建动作候选人集。通过从TDDPG-REC中动态学到的策略向量,表达用户的喜好,我们可以有效地从候选人设置中选择动作。通过在三个公共数据集上的实验,我们证明了TDDPG-REC以时间效率的方式在几个基线上实现最先进的性能。
Due to its nature of learning from dynamic interactions and planning for long-run performance, reinforcement learning (RL) recently has received much attention in interactive recommender systems (IRSs). IRSs usually face the large discrete action space problem, which makes most of the existing RL-based recommendation methods inefficient. Moreover, data sparsity is another challenging problem that most IRSs are confronted with. While the textual information like reviews and descriptions is less sensitive to sparsity, existing RL-based recommendation methods either neglect or are not suitable for incorporating textual information. To address these two problems, in this paper, we propose a Text-based Deep Deterministic Policy Gradient framework (TDDPG-Rec) for IRSs. Specifically, we leverage textual information to map items and users into a feature space, which greatly alleviates the sparsity problem. Moreover, we design an effective method to construct an action candidate set. By the policy vector dynamically learned from TDDPG-Rec that expresses the user's preference, we can select actions from the candidate set effectively. Through experiments on three public datasets, we demonstrate that TDDPG-Rec achieves state-of-the-art performance over several baselines in a time-efficient manner.