互动机器人抓握，属性引导的歧义

论文标题

互动机器人抓握，属性引导的歧义

Interactive Robotic Grasping with Attribute-Guided Disambiguation

论文作者

Yang, Yang, Lou, Xibai, Choi, Changhyun

论文摘要

使用自然语言的交互式机器人抓握是人类机器人互动中最基本的任务之一。但是，语言可能是歧义的根源，尤其是在有含糊的视觉或语言内容时。本文研究了对象属性在歧义中的使用，并开发了一种交互式抓握系统，能够通过对话有效地解决歧义。我们的方法首先通过视觉和语言接地来预测目标得分和归因得分。为了处理模棱两可的对象和命令，我们提出了一个属性引导的公式，以歧义歧义的部分可观察到的马尔可夫决策过程（ATTR-POMDP）。 ATTR-POMDP利用目标和属性得分作为观察模型来计算基于属性的预期回报（例如，“目标，红色或绿色的颜色，红色或绿色？”）或基于指向的（例如，“您的意思是这是一个？”）问题。我们的歧义模块在真实的机器人上实时运行，交互式抓握系统在现实机器人实验中达到了91.43 \％的选择精度，优于大幅度的几个基线。

Interactive robotic grasping using natural language is one of the most fundamental tasks in human-robot interaction. However, language can be a source of ambiguity, particularly when there are ambiguous visual or linguistic contents. This paper investigates the use of object attributes in disambiguation and develops an interactive grasping system capable of effectively resolving ambiguities via dialogues. Our approach first predicts target scores and attribute scores through vision-and-language grounding. To handle ambiguous objects and commands, we propose an attribute-guided formulation of the partially observable Markov decision process (Attr-POMDP) for disambiguation. The Attr-POMDP utilizes target and attribute scores as the observation model to calculate the expected return of an attribute-based (e.g., "what is the color of the target, red or green?") or a pointing-based (e.g., "do you mean this one?") question. Our disambiguation module runs in real time on a real robot, and the interactive grasping system achieves a 91.43\% selection accuracy in the real-robot experiments, outperforming several baselines by large margins.

下载PDF全文

下载文献需遵守相关版权规定

论文标题