论文标题
广义逆计划:学习取消了非马克维亚实用程序,用于概括的任务表示
Generalized Inverse Planning: Learning Lifted non-Markovian Utility for Generalizable Task Representation
论文作者
论文摘要
在搜索时间扩展任务的可推广表示形式时,我们发现了两个必要的成分:该公用事业需要是非马克维亚的,才能将时间关系传递到概率转移,还需要提起该效用以抽象特定的接地对象。在这项工作中,我们从人类的示范中学习了这种实用性。虽然逆增强学习(IRL)已被接受为公用事业学习的一般框架,但其基本表述是马尔可夫的一个具体决策过程。因此,学到的奖励功能不会独立于环境指定任务。除此之外,我们定义了一个泛化领域,该领域跨越了架构后的一系列计划问题。因此,我们提出了一个新的任务,概括性的逆计划,以在该领域学习实用性学习。我们进一步概述了一个计算框架,最大的熵逆计划(MEIP),该计算框架以生成方式学习了非马克维亚实用程序和相关概念。学识渊博的实用程序和概念形成了一个任务表示,无论概率转移或结构变化如何。看到提出的概括问题尚未得到广泛的研究,我们仔细定义了一个评估协议,我们用该协议说明了MEIP对两个概念证明域和一个具有挑战性的任务的有效性:学习从演示中折叠。
In searching for a generalizable representation of temporally extended tasks, we spot two necessary constituents: the utility needs to be non-Markovian to transfer temporal relations invariant to a probability shift, the utility also needs to be lifted to abstract out specific grounding objects. In this work, we study learning such utility from human demonstrations. While inverse reinforcement learning (IRL) has been accepted as a general framework of utility learning, its fundamental formulation is one concrete Markov Decision Process. Thus the learned reward function does not specify the task independently of the environment. Going beyond that, we define a domain of generalization that spans a set of planning problems following a schema. We hence propose a new quest, Generalized Inverse Planning, for utility learning in this domain. We further outline a computational framework, Maximum Entropy Inverse Planning (MEIP), that learns non-Markovian utility and associated concepts in a generative manner. The learned utility and concepts form a task representation that generalizes regardless of probability shift or structural change. Seeing that the proposed generalization problem has not been widely studied yet, we carefully define an evaluation protocol, with which we illustrate the effectiveness of MEIP on two proof-of-concept domains and one challenging task: learning to fold from demonstrations.