Mocoda：基于模型的反事实数据增强

论文标题

Mocoda：基于模型的反事实数据增强

MoCoDA: Model-based Counterfactual Data Augmentation

论文作者

Pitis, Silviu, Creager, Elliot, Mandlekar, Ajay, Garg, Animesh

论文摘要

动态过程中的状态数在对象数量中是指数级的，这使得在复杂的多对象域中变得强化学习（RL）。为了使代理商扩展到现实世界，他们将需要对物体的看不见组合做出反应和理论。我们认为，在过渡动力学中识别和使用局部分解的能力是解锁多对象推理能力的关键元素。为此，我们表明（1）环境转换中的已知局部结构足以使训练动力学模型的样本复杂性的指数降低，以及（2）本地货运的动力学模型证明，概括地将其概括为未见状态和行动。了解本地结构还可以预测这种动态模型将推广到哪些看不见的状态和行动。我们建议在基于新型的基于模型的反事实数据增强（MOCODA）框架中利用这些观察结果。 Mocoda将学习的本地货运动力学模型应用于状态的增强分布和行动，以生成RL的反事实过渡。 Mocoda与先前的工作相比，使用更广泛的本地结构，并可以直接控制增强训练分布。我们表明，莫科达（Mocoda）使RL代理人能够学习以普遍看不见的国家和行动的政策。我们使用mocoda训练离线RL代理，以解决标准离线RL算法失败的分发机器人操纵任务。

The number of states in a dynamic process is exponential in the number of objects, making reinforcement learning (RL) difficult in complex, multi-object domains. For agents to scale to the real world, they will need to react to and reason about unseen combinations of objects. We argue that the ability to recognize and use local factorization in transition dynamics is a key element in unlocking the power of multi-object reasoning. To this end, we show that (1) known local structure in the environment transitions is sufficient for an exponential reduction in the sample complexity of training a dynamics model, and (2) a locally factored dynamics model provably generalizes out-of-distribution to unseen states and actions. Knowing the local structure also allows us to predict which unseen states and actions this dynamics model will generalize to. We propose to leverage these observations in a novel Model-based Counterfactual Data Augmentation (MoCoDA) framework. MoCoDA applies a learned locally factored dynamics model to an augmented distribution of states and actions to generate counterfactual transitions for RL. MoCoDA works with a broader set of local structures than prior work and allows for direct control over the augmented training distribution. We show that MoCoDA enables RL agents to learn policies that generalize to unseen states and actions. We use MoCoDA to train an offline RL agent to solve an out-of-distribution robotics manipulation task on which standard offline RL algorithms fail.

下载PDF全文

下载文献需遵守相关版权规定

论文标题