从演示中学习行为软限制

论文标题

从演示中学习行为软限制

Learning Behavioral Soft Constraints from Demonstrations

论文作者

Glazier, Arie, Loreggia, Andrea, Mattei, Nicholas, Rahgooy, Taher, Rossi, Francesca, Venable, Brent

论文摘要

许多现实生活中的场景要求人类进行艰难的权衡：我们是否始终遵守所有交通规则，还是在紧急情况下违反了速度限制？这些场景迫使我们通过自己的个人目标和欲望来评估集体规则和规范之间的权衡。为了创建有效的AI-Human团队，我们必须为AI代理提供一个模型，即当有隐式和明确的规则和约束时，人类如何在复杂的环境中进行这些权衡。配备这些模型的代理商将能够反映人类的行为和/或引起人类的关注，以改善决策。为此，我们提出了一种新型的逆增强学习方法（IRL）方法：最大熵逆软约束IRL（MESC-EIRL），用于学习对确定性和非确定性环境中的示范中的状态，动作和状态特征的隐式和软限制，这些示例模型为Markov决策过程（MDPS）。我们的方法使代理可以隐式地学习人类的限制和欲望，而无需代理设计师的明确建模并在环境之间转移这些约束。我们的新方法概括了先前的工作，该工作仅考虑确定性的硬约束和实现最新表现状态。

Many real-life scenarios require humans to make difficult trade-offs: do we always follow all the traffic rules or do we violate the speed limit in an emergency? These scenarios force us to evaluate the trade-off between collective rules and norms with our own personal objectives and desires. To create effective AI-human teams, we must equip AI agents with a model of how humans make these trade-offs in complex environments when there are implicit and explicit rules and constraints. Agent equipped with these models will be able to mirror human behavior and/or to draw human attention to situations where decision making could be improved. To this end, we propose a novel inverse reinforcement learning (IRL) method: Max Entropy Inverse Soft Constraint IRL (MESC-IRL), for learning implicit hard and soft constraints over states, actions, and state features from demonstrations in deterministic and non-deterministic environments modeled as Markov Decision Processes (MDPs). Our method enables agents implicitly learn human constraints and desires without the need for explicit modeling by the agent designer and to transfer these constraints between environments. Our novel method generalizes prior work which only considered deterministic hard constraints and achieves state of the art performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题