不确定性意识到系统识别和通用政策

论文标题

不确定性意识到系统识别和通用政策

Uncertainty Aware System Identification with Universal Policies

论文作者

Semage, Buddhika Laknath, Karimpanal, Thommen George, Rana, Santu, Venkatesh, Svetha

论文摘要

SIM2REAL转移主要与将模拟训练的政策转移到潜在的现实世界环境中。与SIM2REAL转移相关的一个常见问题是估算现实世界的环境参数以将模拟环境扎根。尽管现有的方法（例如域随机化（DR））可以通过在训练过程中从参数分布中取样来产生鲁棒策略，但尚无确定给定现实世界中相应分布的参数的既定方法。在这项工作中，我们提出了不确定性的政策搜索（UNCAPS），在此过程中，我们使用通用策略网络（UPN）在整个环境参数范围内存储经过模拟培训的特定于任务特定的策略，然后通过将相关的UPN策略组合在类似时尚的情况下，对给定环境进行稳健的贝叶斯优化来为给定环境进行稳健的策略。由于仅估计与任务相关的参数集，因此预计这种政策驱动的接地将更加有效。此外，我们还解释了搜索过程中的估计不确定性，以制定对核心和认知不确定性既有强大的政策。我们在一系列嘈杂，连续的控制环境中经验评估了我们的方法，并显示出与竞争基线相比的性能提高。

Sim2real transfer is primarily concerned with transferring policies trained in simulation to potentially noisy real world environments. A common problem associated with sim2real transfer is estimating the real-world environmental parameters to ground the simulated environment to. Although existing methods such as Domain Randomisation (DR) can produce robust policies by sampling from a distribution of parameters during training, there is no established method for identifying the parameters of the corresponding distribution for a given real-world setting. In this work, we propose Uncertainty-aware policy search (UncAPS), where we use Universal Policy Network (UPN) to store simulation-trained task-specific policies across the full range of environmental parameters and then subsequently employ robust Bayesian optimisation to craft robust policies for the given environment by combining relevant UPN policies in a DR like fashion. Such policy-driven grounding is expected to be more efficient as it estimates only task-relevant sets of parameters. Further, we also account for the estimation uncertainties in the search process to produce policies that are robust against both aleatoric and epistemic uncertainties. We empirically evaluate our approach in a range of noisy, continuous control environments, and show its improved performance compared to competing baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题