通过嵌入式自我预测进行增强学习的对比解释

论文标题

通过嵌入式自我预测进行增强学习的对比解释

Contrastive Explanations for Reinforcement Learning via Embedded Self Predictions

论文作者

Lin, Zhengxian, Lam, Kim-Ho, Fern, Alan

论文摘要

我们研究了深入的强化学习（RL）体系结构，该架构支持解释为什么博学的代理人更喜欢一种动作而不是另一种动作。关键的想法是学习通过预期未来的人类理解属性直接表示的行动价值。这是通过嵌入式自我预测（ESP）模型实现的，该模型从人类提供的特征方面学习上述属性。然后，可以通过对比每个动作预测的未来属性来解释动作偏好。为了解决存在大量功能的情况，我们开发了一种新颖的方法来计算ANESP的最小解释。我们在包括复杂策略游戏在内的三个领域的案例研究表明，ESP模型可以有效地学习并支持有见地的解释。

We investigate a deep reinforcement learning (RL) architecture that supports explaining why a learned agent prefers one action over another. The key idea is to learn action-values that are directly represented via human-understandable properties of expected futures. This is realized via the embedded self-prediction (ESP)model, which learns said properties in terms of human provided features. Action preferences can then be explained by contrasting the future properties predicted for each action. To address cases where there are a large number of features, we develop a novel method for computing minimal sufficient explanations from anESP. Our case studies in three domains, including a complex strategy game, show that ESP models can be effectively learned and support insightful explanations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题