姿势增强：类别无形的对象姿势姿势转换用于对象识别

论文标题

姿势增强：类别无形的对象姿势姿势转换用于对象识别

Pose Augmentation: Class-agnostic Object Pose Transformation for Object Recognition

论文作者

Ge, Yunhao, Zhao, Jiaping, Itti, Laurent

论文摘要

对象姿势增加了类内物体方差，从而使对象识别2D图像更加困难。为了使分类器鲁棒构成姿势变化，大多数深神经网络试图通过使用每个类别的大量姿势来消除姿势的影响。在这里，我们提出了一种不同的方法：类不足的对象姿势转换网络（OPT-NET）可以沿3D偏航和螺距轴转换图像，以连续合成其他姿势。合成的图像可以更好地训练对象分类器。我们设计了一种新颖的消除ADD结构，以明确地从对象身份脱离姿势：首先消除输入图像的姿势信息，然后添加目标姿势信息（正则化为连续变量）以合成任何目标姿势。我们在ILAB-20M数据集的转盘上拍摄的玩具车辆的图像训练了OPT-NET。在不平衡的离散姿势进行训练（每个对象实例5个姿势，还有5个姿势，只有2个姿势）之后，我们表明Opt-Net可以以高质量沿偏航和螺距轴合成平衡的连续新姿势。用原始Plus合成姿势训练Resnet-18分类器将MAP的准确性提高了9％的原始姿势过度训练。此外，预训练的OPT-NET可以推广到新对象类，我们在ILAB-20M和RGB-D上都证明了这一点。我们还表明，学到的功能可以推广到Imagenet。

Object pose increases intraclass object variance which makes object recognition from 2D images harder. To render a classifier robust to pose variations, most deep neural networks try to eliminate the influence of pose by using large datasets with many poses for each class. Here, we propose a different approach: a class-agnostic object pose transformation network (OPT-Net) can transform an image along 3D yaw and pitch axes to synthesize additional poses continuously. Synthesized images lead to better training of an object classifier. We design a novel eliminate-add structure to explicitly disentangle pose from object identity: first eliminate pose information of the input image and then add target pose information (regularized as continuous variables) to synthesize any target pose. We trained OPT-Net on images of toy vehicles shot on a turntable from the iLab-20M dataset. After training on unbalanced discrete poses (5 classes with 6 poses per object instance, plus 5 classes with only 2 poses), we show that OPT-Net can synthesize balanced continuous new poses along yaw and pitch axes with high quality. Training a ResNet-18 classifier with original plus synthesized poses improves mAP accuracy by 9% overtraining on original poses only. Further, the pre-trained OPT-Net can generalize to new object classes, which we demonstrate on both iLab-20M and RGB-D. We also show that the learned features can generalize to ImageNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题