视觉状态融合：改善自动机器人技术的深度神经网络

论文标题

视觉状态融合：改善自动机器人技术的深度神经网络

Vision-State Fusion: Improving Deep Neural Networks for Autonomous Robotics

论文作者

Cereda, Elia, Bonato, Stefano, Nava, Mirko, Giusti, Alessandro, Palossi, Daniele

论文摘要

基于视觉的深度学习感知在机器人技术中扮演着重要角色，促进解决许多具有挑战性的情景的解决方案，例如自动无人驾驶汽车（UAV）的杂技演习以及机器人辅助的高级手术。面向控制的端到端感知方法直接输出机器人的控制变量，通常将机器人的状态估计作为辅助输入。当估计中间输出并馈送到较低级别的控制器（即介导的方法）时，机器人的状态通常仅作为egipentric任务的输入，该任务估算了机器人本身的物理性质。在这项工作中，我们建议第一次采用类似的方法（据我们所知），将估计的输出涉及外部主题的非上为中心的中介任务。我们证明了我们的一般方法如何改善深度卷积神经网络（CNN）在一类广泛的非中心3D姿势估计问题上的回归性能，并且计算成本最小。通过分析三种高度不同的用例，从用机器人手臂抓住到具有袖珍尺寸无人机的人类受试者，我们的结果始终改善R \ TextSuperscript {2}回归度量指标，高达+0.51，高达+0.51，与他们的无遗产基线相比。最后，我们验证了人类姿势估计任务上闭环自动cm级无人机的现场性能。我们的结果表明，与最先进的无状态同行相比，我们的状态CNN的平均绝对误差平均降低了，即平均为24％\％。

Vision-based deep learning perception fulfills a paramount role in robotics, facilitating solutions to many challenging scenarios, such as acrobatic maneuvers of autonomous unmanned aerial vehicles (UAVs) and robot-assisted high-precision surgery. Control-oriented end-to-end perception approaches, which directly output control variables for the robot, commonly take advantage of the robot's state estimation as an auxiliary input. When intermediate outputs are estimated and fed to a lower-level controller, i.e. mediated approaches, the robot's state is commonly used as an input only for egocentric tasks, which estimate physical properties of the robot itself. In this work, we propose to apply a similar approach for the first time -- to the best of our knowledge -- to non-egocentric mediated tasks, where the estimated outputs refer to an external subject. We prove how our general methodology improves the regression performance of deep convolutional neural networks (CNNs) on a broad class of non-egocentric 3D pose estimation problems, with minimal computational cost. By analyzing three highly-different use cases, spanning from grasping with a robotic arm to following a human subject with a pocket-sized UAV, our results consistently improve the R\textsuperscript{2} regression metric, up to +0.51, compared to their stateless baselines. Finally, we validate the in-field performance of a closed-loop autonomous cm-scale UAV on the human pose estimation task. Our results show a significant reduction, i.e., 24\% on average, on the mean absolute error of our stateful CNN, compared to a State-of-the-Art stateless counterpart.

下载PDF全文

下载文献需遵守相关版权规定

论文标题