通过过渡模型学习连续控制的政策

论文标题

通过过渡模型学习连续控制的政策

Learning Policies for Continuous Control via Transition Models

论文作者

Huebotter, Justus, Thill, Serge, van Gerven, Marcel, Lanillos, Pablo

论文摘要

值得怀疑的是，动物具有其四肢的完美反向模型（例如，必须在每个关节上施加什么肌肉收缩才能到达太空中的特定位置）。但是，在机器人控制中，将ARM的最终效应器移至目标位置或沿目标轨迹需要准确的前进和反向模型。在这里，我们表明，通过从交互中学习过渡（向前）模型，我们可以使用它来推动摊销策略的学习。因此，我们重新审视了与深度主动推理框架有关的策略优化，并描述了一个模块化神经网络体系结构，该模块化神经网络体系结构同时从预测错误中学习了系统动力学以及生成合适的连续控制命令以达到所需参考位置的随机策略。我们通过将模型与线性二次调节器的基线进行比较来评估该模型，并以朝着人类样运动控制的其他步骤进行结论。

It is doubtful that animals have perfect inverse models of their limbs (e.g., what muscle contraction must be applied to every joint to reach a particular location in space). However, in robot control, moving an arm's end-effector to a target position or along a target trajectory requires accurate forward and inverse models. Here we show that by learning the transition (forward) model from interaction, we can use it to drive the learning of an amortized policy. Hence, we revisit policy optimization in relation to the deep active inference framework and describe a modular neural network architecture that simultaneously learns the system dynamics from prediction errors and the stochastic policy that generates suitable continuous control commands to reach a desired reference position. We evaluated the model by comparing it against the baseline of a linear quadratic regulator, and conclude with additional steps to take toward human-like motor control.

下载PDF全文

下载文献需遵守相关版权规定

论文标题