论文标题
3D点云中的交互式对象分割
Interactive Object Segmentation in 3D Point Clouds
论文作者
论文摘要
我们为3D实例细分提出了一种交互式方法,用户可以与深度学习模型进行迭代合作,以直接在3D点云中分段对象。通常以完全监督的方式对3D实例细分的当前方法进行培训,该方式需要大量昂贵的培训标签,并且在培训期间对课堂的概括不佳。很少有作品尝试使用人类相互作用来获得3D分割面罩。现有方法依赖于2D图像域中的用户反馈。结果,要求用户在2D图像和3D表示之间不断切换,并采用自定义体系结构来结合多个输入模式。因此,与现有标准3D模型的集成并不直接。这项工作的核心思想是通过单击所需的感兴趣的3D对象(或它们的背景)直接与3D点云进行交互,以在开放世界设置中交互分段场景。具体来说,我们的方法不需要来自任何目标域的培训数据,并且可以适应没有适当训练集的新环境。我们的系统会根据用户反馈不断调整对象细分,并以最小的人体努力(每个对象点击几下单击)来实现准确的密集3D分割口罩。除了有效地标记大规模和不同3D数据集的潜力外,我们的方法(用户直接与3D环境进行交互)还可以在AR/VR和人类机器人交互中启用新应用程序。
We propose an interactive approach for 3D instance segmentation, where users can iteratively collaborate with a deep learning model to segment objects in a 3D point cloud directly. Current methods for 3D instance segmentation are generally trained in a fully-supervised fashion, which requires large amounts of costly training labels, and does not generalize well to classes unseen during training. Few works have attempted to obtain 3D segmentation masks using human interactions. Existing methods rely on user feedback in the 2D image domain. As a consequence, users are required to constantly switch between 2D images and 3D representations, and custom architectures are employed to combine multiple input modalities. Therefore, integration with existing standard 3D models is not straightforward. The core idea of this work is to enable users to interact directly with 3D point clouds by clicking on desired 3D objects of interest~(or their background) to interactively segment the scene in an open-world setting. Specifically, our method does not require training data from any target domain, and can adapt to new environments where no appropriate training sets are available. Our system continuously adjusts the object segmentation based on the user feedback and achieves accurate dense 3D segmentation masks with minimal human effort (few clicks per object). Besides its potential for efficient labeling of large-scale and varied 3D datasets, our approach, where the user directly interacts with the 3D environment, enables new applications in AR/VR and human-robot interaction.