关于视觉和运动学嵌入的关系图学习，用于机器人手术中的精确手势识别

论文标题

关于视觉和运动学嵌入的关系图学习，用于机器人手术中的精确手势识别

Relational Graph Learning on Visual and Kinematics Embeddings for Accurate Gesture Recognition in Robotic Surgery

论文作者

Long, Yonghao, Wu, Jie Ying, Lu, Bo, Jin, Yueming, Unberath, Mathias, Liu, Yun-Hui, Heng, Pheng Ann, Dou, Qi

论文摘要

自动手术手势识别对于在机器人手术中启用智能认知援助至关重要。随着最近在机器人辅助的微创手术方面的进步，可以记录包括手术视频和机器人运动学在内的丰富信息，这提供了互补的知识，以理解手术手术。但是，现有方法要么仅采用单模式数据或直接串联多模式表示，因此无法充分利用视觉和运动学数据中固有的信息相关性来提高手势识别精度。在这方面，我们提出了一种新型的在线方法，即多模式关系图网络（即MRG-NET）通过潜在特征空间中的交互式消息传播动态整合视觉和运动学信息。具体而言，我们首先从视频和运动学序列中提取具有时间卷积网络和LSTM单元的嵌入。接下来，我们确定这些多模式嵌入中的多关系，并通过分层的关系图学习模块利用它们。在公共拼图数据集上，我们的方法的有效性得到了证明，在缝合和结式键入任务上都优于当前的单模式和多模式方法。此外，我们验证了我们在两个中心使用Da Vinci Research套件（DVRK）平台收集的内部视觉基因模式数据集的方法，并具有一致的有希望的性能。

Automatic surgical gesture recognition is fundamentally important to enable intelligent cognitive assistance in robotic surgery. With recent advancement in robot-assisted minimally invasive surgery, rich information including surgical videos and robotic kinematics can be recorded, which provide complementary knowledge for understanding surgical gestures. However, existing methods either solely adopt uni-modal data or directly concatenate multi-modal representations, which can not sufficiently exploit the informative correlations inherent in visual and kinematics data to boost gesture recognition accuracies. In this regard, we propose a novel online approach of multi-modal relational graph network (i.e., MRG-Net) to dynamically integrate visual and kinematics information through interactive message propagation in the latent feature space. In specific, we first extract embeddings from video and kinematics sequences with temporal convolutional networks and LSTM units. Next, we identify multi-relations in these multi-modal embeddings and leverage them through a hierarchical relational graph learning module. The effectiveness of our method is demonstrated with state-of-the-art results on the public JIGSAWS dataset, outperforming current uni-modal and multi-modal methods on both suturing and knot typing tasks. Furthermore, we validated our method on in-house visual-kinematics datasets collected with da Vinci Research Kit (dVRK) platforms in two centers, with consistent promising performance achieved.

下载PDF全文

下载文献需遵守相关版权规定

论文标题