论文标题
非参数选择模型的积极学习
Active Learning for Non-Parametric Choice Models
论文作者
论文摘要
我们研究了基于消费者的决定积极学习非参数选择模型的问题。我们提出一个负面结果,表明这种选择模型可能无法识别。为了克服可识别性问题,我们引入了选择模型的有向无环图(DAG)表示。事实证明,该表示形式可以编码有关选择模型的所有信息,这些信息可以从可用数据中推断出来,从某种意义上说,它允许计算所有选择概率。 我们确定了为项目集集合的确切选择概率,可以重建DAG。但是,试图扩展此方法以估算从主动学习过程中获得的嘈杂选择频率数据的DAG导致不准确性。为了应对这一挑战,我们提出了一种包容性排斥方法,该方法可以有效地管理DAG级别的错误传播,从而更准确地估算了DAG。利用这项技术,我们的算法估计了潜在的非参数选择模型的DAG表示。当随机均匀地绘制频繁排名时,该算法有效地(在多项式时间内)运行。它通过主动并反复提供各种项目并观察所选项目来了解频繁偏好类型中最受欢迎的项目的分布。我们证明,与相应的非活动学习估计算法相比,我们的算法更有效地恢复了消费者偏好的合成和公开数据集的一组频繁偏好。这些发现强调了我们的算法的价值以及积极学习方法在建模消费者行为中的更广泛适用性。
We study the problem of actively learning a non-parametric choice model based on consumers' decisions. We present a negative result showing that such choice models may not be identifiable. To overcome the identifiability problem, we introduce a directed acyclic graph (DAG) representation of the choice model. This representation provably encodes all the information about the choice model which can be inferred from the available data, in the sense that it permits computing all choice probabilities. We establish that given exact choice probabilities for a collection of item sets, one can reconstruct the DAG. However, attempting to extend this methodology to estimate the DAG from noisy choice frequency data obtained during an active learning process leads to inaccuracies. To address this challenge, we present an inclusion-exclusion approach that effectively manages error propagation across DAG levels, leading to a more accurate estimate of the DAG. Utilizing this technique, our algorithm estimates the DAG representation of an underlying non-parametric choice model. The algorithm operates efficiently (in polynomial time) when the set of frequent rankings is drawn uniformly at random. It learns the distribution over the most popular items among frequent preference types by actively and repeatedly offering assortments of items and observing the chosen item. We demonstrate that our algorithm more effectively recovers a set of frequent preferences on both synthetic and publicly available datasets on consumers' preferences, compared to corresponding non-active learning estimation algorithms. These findings underscore the value of our algorithm and the broader applicability of active-learning approaches in modeling consumer behavior.