论文标题
通过线性决策边界通过政策学习来解释个性化
Interpretable Personalization via Policy Learning with Linear Decision Boundaries
论文作者
论文摘要
随着数字经济的兴起和有关消费者的可用信息的爆炸,商品和服务的有效个性化已成为公司改善收入和保持竞争优势的核心业务。本文通过政策学习的角度研究个性化问题,目标是学习一条决策规则(政策),该规则将消费者和产品特征(功能)映射到建议(动作)以优化结果(奖励)。我们专注于使用未知数据收集程序使用可用的历史数据进行离线学习,其中关键挑战是建议的非随机分配。此外,在许多业务和医疗应用中,政策的解释性至关重要。我们使用线性决策边界研究政策类别,以确保解释性,并提出使用因果推理中的工具来解决不平衡治疗方法的学习算法。我们研究了几种优化方案,以解决相关的非凸,非平滑优化问题,并发现贝叶斯优化算法有效。我们通过广泛的仿真研究测试算法,并将其应用于匿名的在线市场客户购买数据集,在这里,学识渊博的政策根据客户和产品功能提供个性化的折扣建议,以最大程度地提高卖家的总商品价值(GMV)。我们博学的政策在净销售收入中将平台的基线提高了88.2 \%,同时还提供了有关哪些功能对于决策过程重要的信息。我们的发现表明,我们提出的政策学习框架使用因果推理和贝叶斯优化的工具提供了一种有希望的实用方法,可以在广泛的应用程序上进行可解释的个性化。
With the rise of the digital economy and an explosion of available information about consumers, effective personalization of goods and services has become a core business focus for companies to improve revenues and maintain a competitive edge. This paper studies the personalization problem through the lens of policy learning, where the goal is to learn a decision-making rule (a policy) that maps from consumer and product characteristics (features) to recommendations (actions) in order to optimize outcomes (rewards). We focus on using available historical data for offline learning with unknown data collection procedures, where a key challenge is the non-random assignment of recommendations. Moreover, in many business and medical applications, interpretability of a policy is essential. We study the class of policies with linear decision boundaries to ensure interpretability, and propose learning algorithms using tools from causal inference to address unbalanced treatments. We study several optimization schemes to solve the associated non-convex, non-smooth optimization problem, and find that a Bayesian optimization algorithm is effective. We test our algorithm with extensive simulation studies and apply it to an anonymized online marketplace customer purchase dataset, where the learned policy outputs a personalized discount recommendation based on customer and product features in order to maximize gross merchandise value (GMV) for sellers. Our learned policy improves upon the platform's baseline by 88.2\% in net sales revenue, while also providing informative insights on which features are important for the decision-making process. Our findings suggest that our proposed policy learning framework using tools from causal inference and Bayesian optimization provides a promising practical approach to interpretable personalization across a wide range of applications.