论文标题
基于优先的学习对用户引导的HZD步态产生的学习
Preference-Based Learning for User-Guided HZD Gait Generation on Bipedal Walking Robots
论文作者
论文摘要
本文提出了一个框架,该框架利用控制理论和机器学习来获得稳定且稳健的双足动力,而无需手动参数调整。传统上,步态是通过轨迹优化方法生成的,然后通过实验实现 - 由于模型和硬件之间的差异,通常需要进行广泛调整的过程。在这项工作中,基于混合的零动力学(HZD)优化的步态实现过程正式结合了基于首选项的学习,以系统地实现动态稳定的步行。重要的是,这种学习方法不需要精心构建的奖励功能,而是利用人类成对的偏好。通过对平面润滑的琥珀-3m:第一个具有刚性 - 刚毛的平面的实验证明了所提出的方法的功能,第二个具有诱导模型不确定性的弹簧,在步态产生或控制器中未考虑添加的符合性。在这两个实验中,该框架在少于50次迭代中都可以达到稳定,稳健,高效和自然行走,而无需依赖模拟环境。这些结果证明了控制理论和学习的统一中一个有希望的步骤。
This paper presents a framework that leverages both control theory and machine learning to obtain stable and robust bipedal locomotion without the need for manual parameter tuning. Traditionally, gaits are generated through trajectory optimization methods and then realized experimentally -- a process that often requires extensive tuning due to differences between the models and hardware. In this work, the process of gait realization via hybrid zero dynamics (HZD) based optimization is formally combined with preference-based learning to systematically realize dynamically stable walking. Importantly, this learning approach does not require a carefully constructed reward function, but instead utilizes human pairwise preferences. The power of the proposed approach is demonstrated through two experiments on a planar biped AMBER-3M: the first with rigid point-feet, and the second with induced model uncertainty through the addition of springs where the added compliance was not accounted for in the gait generation or in the controller. In both experiments, the framework achieves stable, robust, efficient, and natural walking in fewer than 50 iterations with no reliance on a simulation environment. These results demonstrate a promising step in the unification of control theory and learning.