关闭循环：图形网络以统一语义对象和视觉特征的多对象场景

论文标题

关闭循环：图形网络以统一语义对象和视觉特征的多对象场景

Closing the Loop: Graph Networks to Unify Semantic Objects and Visual Features for Multi-object Scenes

论文作者

Kim, Jonathan J. Y., Urschler, Martin, Riddle, Patricia J., Wicker, Jörg S.

论文摘要

在同时定位和映射（SLAM）中，环路闭合检测（LCD）对于识别先前访问的地方时的漂移至少是必不可少的。视觉袋（VBOW）是许多最先进的大满贯系统的LCD算法。它使用一组视觉功能来提供健壮的位置识别，但无法感知特征点之间的语义或空间关系。以前的工作主要集中在解决这些问题上，通过将VBOW与现场对象的语义和空间信息相结合来解决这些问题。但是，他们无法利用本地视觉特征的空间信息，并且缺乏统一语义对象和视觉特征的结构，因此限制了两个组件之间的共生。本文提出了Symbiolcd2，该symbiolcd2创建了一个统一的图形结构，以共同整合语义对象和视觉特征。我们新颖的基于图的LCD系统通过应用具有时间约束的Weisfeiler-Lehman图内核来利用统一的图结构来预测循环闭合候选者。对拟议系统的评估表明，具有统一的图形结构结合了语义对象和视觉特征，提高了LCD预测精度，这说明了所提出的图形结构在这两个互补组件之间提供了强烈的共生。它还优于其他机器学习算法 - 例如SVM，决策树，随机森林，神经网络和基于GNN的图形匹配网络。此外，它在检测循环闭合候选方面表现出良好的性能，而不是最先进的SLAM系统，这表明统一图结构的扩展语义和空间意识会显着影响LCD的性能。

In Simultaneous Localization and Mapping (SLAM), Loop Closure Detection (LCD) is essential to minimize drift when recognizing previously visited places. Visual Bag-of-Words (vBoW) has been an LCD algorithm of choice for many state-of-the-art SLAM systems. It uses a set of visual features to provide robust place recognition but fails to perceive the semantics or spatial relationship between feature points. Previous work has mainly focused on addressing these issues by combining vBoW with semantic and spatial information from objects in the scene. However, they are unable to exploit spatial information of local visual features and lack a structure that unifies semantic objects and visual features, therefore limiting the symbiosis between the two components. This paper proposes SymbioLCD2, which creates a unified graph structure to integrate semantic objects and visual features symbiotically. Our novel graph-based LCD system utilizes the unified graph structure by applying a Weisfeiler-Lehman graph kernel with temporal constraints to robustly predict loop closure candidates. Evaluation of the proposed system shows that having a unified graph structure incorporating semantic objects and visual features improves LCD prediction accuracy, illustrating that the proposed graph structure provides a strong symbiosis between these two complementary components. It also outperforms other Machine Learning algorithms - such as SVM, Decision Tree, Random Forest, Neural Network and GNN based Graph Matching Networks. Furthermore, it has shown good performance in detecting loop closure candidates earlier than state-of-the-art SLAM systems, demonstrating that extended semantic and spatial awareness from the unified graph structure significantly impacts LCD performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题