基于策略的贝叶斯实验设计，用于非差异性隐式模型

论文标题

基于策略的贝叶斯实验设计，用于非差异性隐式模型

Policy-Based Bayesian Experimental Design for Non-Differentiable Implicit Models

论文作者

Lim, Vincent, Novoseller, Ellen, Ichnowski, Jeffrey, Huang, Huang, Goldberg, Ken

论文摘要

对于医疗保健，物理，能源，机器人技术和许多其他领域的应用，设计最大信息的实验是有价值的，尤其是在实验昂贵，耗时或构成安全危害时。尽管现有方法可以基于先前的观察历史记录依次设计实验，但其中许多方法并未扩展到隐式模型，在这种模型中可能是可能的，但是计算可能性的可能性是棘手的。此外，他们通常需要在部署期间进行重要的在线计算或可区分的模拟系统。我们引入了深层自适应设计（RL-DAD）的强化学习，这是一种基于模拟的非差异隐式模型的最佳实验设计方法。 RL-DAD通过将基于奖励功能的马尔可夫决策过程重新授课，以基于奖励功能的奖励功能，基于无可能的信息下限，它用于通过深度加强学习来学习政策，从而扩展了基于政策的贝叶斯最佳实验设计（BOED）的先前工作。博学的设计政策地图先前的历史记录以实验设计，可以在在线执行过程中快速部署。我们评估了RL-DAD，发现它在三个基准测试基准上的基线竞争性能。

For applications in healthcare, physics, energy, robotics, and many other fields, designing maximally informative experiments is valuable, particularly when experiments are expensive, time-consuming, or pose safety hazards. While existing approaches can sequentially design experiments based on prior observation history, many of these methods do not extend to implicit models, where simulation is possible but computing the likelihood is intractable. Furthermore, they often require either significant online computation during deployment or a differentiable simulation system. We introduce Reinforcement Learning for Deep Adaptive Design (RL-DAD), a method for simulation-based optimal experimental design for non-differentiable implicit models. RL-DAD extends prior work in policy-based Bayesian Optimal Experimental Design (BOED) by reformulating it as a Markov Decision Process with a reward function based on likelihood-free information lower bounds, which is used to learn a policy via deep reinforcement learning. The learned design policy maps prior histories to experiment designs offline and can be quickly deployed during online execution. We evaluate RL-DAD and find that it performs competitively with baselines on three benchmarks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题