论文标题
贝叶斯的经验重用多个示威者学习
Bayesian Experience Reuse for Learning from Multiple Demonstrators
论文作者
论文摘要
从示范中学习(LFD)通过合并专家的示范来提高学习代理的探索效率。但是,演示数据通常可能来自具有相互矛盾的目标的多位专家,因此很难在在线环境中安全有效地合并。我们通过使用正常的内式伽马研究员对源和目标任务函数的不确定性进行建模,我们在静态和动态优化设置中解决了这个问题,该源代码函数分别使用具有共同特征的贝叶斯神经网络从演示和目标数据中学到了相应的后代。我们使用这种学识渊博的信念来得出一个二次编程问题,该问题的解决方案在专家模型上产生了概率分布。最后,我们建议贝叶斯体验重用(BERS)根据此分配进行采样示范,并直接在新任务中重用。我们证明了这种方法对平滑函数进行静态优化的有效性,并在高维供应链问题中转移学习具有成本不确定性。
Learning from demonstrations (LfD) improves the exploration efficiency of a learning agent by incorporating demonstrations from experts. However, demonstration data can often come from multiple experts with conflicting goals, making it difficult to incorporate safely and effectively in online settings. We address this problem in the static and dynamic optimization settings by modelling the uncertainty in source and target task functions using normal-inverse-gamma priors, whose corresponding posteriors are, respectively, learned from demonstrations and target data using Bayesian neural networks with shared features. We use this learned belief to derive a quadratic programming problem whose solution yields a probability distribution over the expert models. Finally, we propose Bayesian Experience Reuse (BERS) to sample demonstrations in accordance with this distribution and reuse them directly in new tasks. We demonstrate the effectiveness of this approach for static optimization of smooth functions, and transfer learning in a high-dimensional supply chain problem with cost uncertainty.