通过边界框注释加速实例细分训练集的创建

论文标题

通过边界框注释加速实例细分训练集的创建

Accelerating the creation of instance segmentation training sets through bounding box annotation

论文作者

Sayez, Niels, De Vleeschouwer, Christophe

论文摘要

在特定的应用环境中部署CNN时，收集图像注释仍然是一个重大负担。当注释由涵盖对象实例的二进制面具组成时，尤其是这种情况。我们的工作建议基于半自动方法来描述三个步骤的实例：（1）手动定义对象的极端点（左，右，底部，底部像素）的极端点，从而提供对象边界框，（2）使用深度切割的通用自动切割工具，例如深度切割，例如将极端的对象换成一个极端的点段，并将其转化为段的距离，并将其置于距离上。（3）手动纠正预测的面具。然后研究各种策略，以平衡人工手动注释资源之间的边界框定义和掩盖校正之间，包括何时根据其与其他实例界限盒的重叠或实例细分模型的结果对实例掩码进行优先级，以部分注释的数据集进行了培训。我们的实验研究考虑了TeamSport播放器细分任务，并衡量了Panoptic-Deeplab实例细分模型的准确性如何取决于人类注释资源分配策略。它揭示了极端点的唯一定义会导致模型准确性，如果通过完全手动的实例描述来定义掩码，则需要多达10倍的资源。当针对较高的准确性时，还证明了训练设定实例之间的掩码校正优先级校正，可节省多达80 \％的校正注释资源，而与实例框架校正相比，同一训练的实例分割模型的精确度。

Collecting image annotations remains a significant burden when deploying CNN in a specific applicative context. This is especially the case when the annotation consists in binary masks covering object instances. Our work proposes to delineate instances in three steps, based on a semi-automatic approach: (1) the extreme points of an object (left-most, right-most, top, bottom pixels) are manually defined, thereby providing the object bounding-box, (2) a universal automatic segmentation tool like Deep Extreme Cut is used to turn the bounded object into a segmentation mask that matches the extreme points; and (3) the predicted mask is manually corrected. Various strategies are then investigated to balance the human manual annotation resources between bounding-box definition and mask correction, including when the correction of instance masks is prioritized based on their overlap with other instance bounding-boxes, or the outcome of an instance segmentation model trained on a partially annotated dataset. Our experimental study considers a teamsport player segmentation task, and measures how the accuracy of the Panoptic-Deeplab instance segmentation model depends on the human annotation resources allocation strategy. It reveals that the sole definition of extreme points results in a model accuracy that would require up to 10 times more resources if the masks were defined through fully manual delineation of instances. When targeting higher accuracies, prioritizing the mask correction among the training set instances is also shown to save up to 80\% of correction annotation resources compared to a systematic frame by frame correction of instances, for a same trained instance segmentation model accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题