未见对象实例分段的平均移位掩码变压器

论文标题

未见对象实例分段的平均移位掩码变压器

Mean Shift Mask Transformer for Unseen Object Instance Segmentation

论文作者

Lu, Yangxiao, Chen, Yuqiao, Ruozzi, Nicholas, Xiang, Yu

论文摘要

从图像中分割出看不见的对象是机器人需要获取的一项关键感知技能。在机器人操纵中，它可以促进机器人掌握和操纵看不见的物体。平均移位群集是一种用于图像分割任务的广泛使用的方法。但是，传统的平均偏移聚类算法并非可区分，因此很难将其集成到端到端的神经网络训练框架中。在这项工作中，我们提出了平均移位蒙版变压器（MSMFormer），这是一种新的变压器体系结构，模拟Von Mises-fisher（VMF）平均移位群集聚类算法，从而允许对特征提取器和群集的关节训练和推断。它的中心部分是一种高晶体注意机制，它更新了在超晶体上的对象查询。为了说明我们方法的有效性，我们将MSMFormer应用于看不见的对象实例分割。我们的实验表明，与看不见的对象实例分割的最新方法相比，MSMFORMER可以实现竞争性能。项目页面，附录，视频和代码可在https://irvlutd.github.io/msmformer上找到

Segmenting unseen objects from images is a critical perception skill that a robot needs to acquire. In robot manipulation, it can facilitate a robot to grasp and manipulate unseen objects. Mean shift clustering is a widely used method for image segmentation tasks. However, the traditional mean shift clustering algorithm is not differentiable, making it difficult to integrate it into an end-to-end neural network training framework. In this work, we propose the Mean Shift Mask Transformer (MSMFormer), a new transformer architecture that simulates the von Mises-Fisher (vMF) mean shift clustering algorithm, allowing for the joint training and inference of both the feature extractor and the clustering. Its central component is a hypersphere attention mechanism, which updates object queries on a hypersphere. To illustrate the effectiveness of our method, we apply MSMFormer to unseen object instance segmentation. Our experiments show that MSMFormer achieves competitive performance compared to state-of-the-art methods for unseen object instance segmentation. The project page, appendix, video, and code are available at https://irvlutd.github.io/MSMFormer

下载PDF全文

下载文献需遵守相关版权规定

论文标题