论文标题
疲劳感知的土匪,用于依赖的点击模型
Fatigue-aware Bandits for Dependent Click Models
论文作者
论文摘要
由于推荐系统发送了大量内容以保持用户的参与度,因此用户可能会遇到疲劳,而疲劳是1)对无关紧要的内容过度暴露,2)无聊地看到太多类似的建议。为了解决这个问题,我们考虑了一个在线学习设置,平台可以在其中学习一项策略来推荐将用户疲劳考虑在内的内容。我们建议对依赖性点击模型(DCM)进行扩展,以描述用户的行为。我们规定,对于每个内容,其对用户的吸引力取决于其内在相关性和折现因子,该折扣因子衡量显示了多少个类似内容。用户顺序查看推荐的内容,然后单击他们觉得有吸引力的内容。用户可以随时离开平台,当不喜欢内容时,退出的可能性更高。根据用户的反馈,该平台了解了基础内容的相关性以及由于内容疲劳而引起的折现效果。我们将这项学习任务称为“疲劳意识到DCM强盗”问题。我们考虑两个学习方案,具体取决于是否已知折现效果。对于每种情况,我们都提出了一种学习算法,该算法同时探索和利用并表征其遗憾的束缚。
As recommender systems send a massive amount of content to keep users engaged, users may experience fatigue which is contributed by 1) an overexposure to irrelevant content, 2) boredom from seeing too many similar recommendations. To address this problem, we consider an online learning setting where a platform learns a policy to recommend content that takes user fatigue into account. We propose an extension of the Dependent Click Model (DCM) to describe users' behavior. We stipulate that for each piece of content, its attractiveness to a user depends on its intrinsic relevance and a discount factor which measures how many similar contents have been shown. Users view the recommended content sequentially and click on the ones that they find attractive. Users may leave the platform at any time, and the probability of exiting is higher when they do not like the content. Based on user's feedback, the platform learns the relevance of the underlying content as well as the discounting effect due to content fatigue. We refer to this learning task as "fatigue-aware DCM Bandit" problem. We consider two learning scenarios depending on whether the discounting effect is known. For each scenario, we propose a learning algorithm which simultaneously explores and exploits, and characterize its regret bound.