论文标题
在动态治疗方案中优化悲观主义:贝叶斯学习方法
Optimizing Pessimism in Dynamic Treatment Regimes: A Bayesian Learning Approach
论文作者
论文摘要
在本文中,我们提出了一种基于悲观的新型贝叶斯学习方法,以在离线环境中进行最佳动态治疗方案。当覆盖条件不存在时,这对于离线数据很常见,现有解决方案将产生次优政策。悲观的原则通过阻止对国家探讨的行动的建议来解决这个问题。但是,几乎所有基于悲观的方法都依赖于量化悲观程度的关键超参数,并且该方法的性能可能对选择该参数的选择高度敏感。我们建议将悲观原理与汤普森采样和贝叶斯机器学习相结合,以优化悲观的程度。我们得出了一个可靠的集合,其边界均匀地下界限最佳Q功能,因此我们不需要对悲观程度进行额外调整。我们开发了一种通用的贝叶斯学习方法,该方法可与一系列模型一起使用,从贝叶斯线性基础模型到贝叶斯神经网络模型。我们基于变异推理开发计算算法,这是高效且可扩展的。我们建立了所提出方法的理论保证,并从经验上表明,它通过模拟和真实数据示例优于现有的最新解决方案。
In this article, we propose a novel pessimism-based Bayesian learning method for optimal dynamic treatment regimes in the offline setting. When the coverage condition does not hold, which is common for offline data, the existing solutions would produce sub-optimal policies. The pessimism principle addresses this issue by discouraging recommendation of actions that are less explored conditioning on the state. However, nearly all pessimism-based methods rely on a key hyper-parameter that quantifies the degree of pessimism, and the performance of the methods can be highly sensitive to the choice of this parameter. We propose to integrate the pessimism principle with Thompson sampling and Bayesian machine learning for optimizing the degree of pessimism. We derive a credible set whose boundary uniformly lower bounds the optimal Q-function, and thus we do not require additional tuning of the degree of pessimism. We develop a general Bayesian learning method that works with a range of models, from Bayesian linear basis model to Bayesian neural network model. We develop the computational algorithm based on variational inference, which is highly efficient and scalable. We establish the theoretical guarantees of the proposed method, and show empirically that it outperforms the existing state-of-the-art solutions through both simulations and a real data example.