论文标题
用于对象分割和6D姿势估计的Opose-6D数据集
DoPose-6D dataset for object segmentation and 6D pose estimation
论文作者
论文摘要
场景理解对于确定智能机器人抓握和操纵的能力至关重要。这是一个可以使用不同技术来解决的问题:可见的对象分割,看不见的对象分割或6D姿势估计。这些技术甚至可以扩展到多视图。这些问题的大多数工作都取决于合成数据集,因为缺乏足够大的实际数据集,并且仅使用可用的实际数据集进行评估。 这鼓励我们引入一个新的数据集(称为掺杂6D)。该数据集包含用于6D姿势估计,对象分割和多视图注释的注释,这些注释服务于所有预先提到的技术。该数据集包含两种类型的场景bin拾取和桌面,其中该数据集集合的主要动机是bin选择。 我们说明了该数据集在看不见的对象细分的上下文中的效果,并提供了有关混合培训合成和真实数据的一些见解。我们训练一种面具R-CNN模型,该模型可用于行业和机器人抓地应用。最后,我们展示了我们的数据集如何提高蒙版R-CNN模型的性能。 我们的Opose-6D数据集,受过训练的网络模型,管道代码和ROS驱动程序可在线提供。
Scene understanding is essential in determining how intelligent robotic grasping and manipulation could get. It is a problem that can be approached using different techniques: seen object segmentation, unseen object segmentation, or 6D pose estimation. These techniques can even be extended to multi-view. Most of the work on these problems depends on synthetic datasets due to the lack of real datasets that are big enough for training and merely use the available real datasets for evaluation. This encourages us to introduce a new dataset (called DoPose-6D). The dataset contains annotations for 6D Pose estimation, object segmentation, and multi-view annotations, which serve all the pre-mentioned techniques. The dataset contains two types of scenes bin picking and tabletop, with the primary motive for this dataset collection being bin picking. We illustrate the effect of this dataset in the context of unseen object segmentation and provide some insights on mixing synthetic and real data for the training. We train a Mask R-CNN model that is practical to be used in industry and robotic grasping applications. Finally, we show how our dataset boosted the performance of a Mask R-CNN model. Our DoPose-6D dataset, trained network models, pipeline code, and ROS driver are available online.