宾夕法尼亚州：培训对象识别模型无人标签

论文标题

宾夕法尼亚州：培训对象识别模型无人标签

PennSyn2Real: Training Object Recognition Models without Human Labeling

论文作者

Nguyen, Ty, Miller, Ian D., Cohen, Avi, Thakur, Dinesh, Prasad, Shashank, Taylor, Camillo J., Chaudrahi, Pratik, Kumar, Vijay

论文摘要

可扩展的培训数据生成是深度学习的关键问题。我们提出了Pennsyn2Real的建议 - 一种光真实的合成数据集，该数据集由20多种超过20种微型航空车（MAV）的100,000多个4K图像组成。该数据集可用于为高级计算机视觉任务（例如MAV检测和分类）生成任意数量的培训图像。我们的数据生成框架引导Chroma-Keying是一种具有运动跟踪系统的成熟摄影技术，提供了无伪影和精选的带注释的图像，其中对象方向和照明受到控制。该框架易于设置，可以应用于广泛的对象，从而减少了合成数据和现实世界数据之间的差距。我们表明，使用此框架生成的合成数据可直接用于训练CNN模型，以进行常见对象识别任务，例如检测和分割。与仅使用真实图像的培训相比，我们证明了竞争性能。此外，在几次学习中引导生成的合成数据可以显着提高整体性能，从而减少所需的训练数据样本的数量以达到所需的准确性。

Scalable training data generation is a critical problem in deep learning. We propose PennSyn2Real - a photo-realistic synthetic dataset consisting of more than 100,000 4K images of more than 20 types of micro aerial vehicles (MAVs). The dataset can be used to generate arbitrary numbers of training images for high-level computer vision tasks such as MAV detection and classification. Our data generation framework bootstraps chroma-keying, a mature cinematography technique with a motion tracking system, providing artifact-free and curated annotated images where object orientations and lighting are controlled. This framework is easy to set up and can be applied to a broad range of objects, reducing the gap between synthetic and real-world data. We show that synthetic data generated using this framework can be directly used to train CNN models for common object recognition tasks such as detection and segmentation. We demonstrate competitive performance in comparison with training using only real images. Furthermore, bootstrapping the generated synthetic data in few-shot learning can significantly improve the overall performance, reducing the number of required training data samples to achieve the desired accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题