Transformnet：通过预测几何变换来进行自我监督的表示学习

论文标题

Transformnet：通过预测几何变换来进行自我监督的表示学习

TransformNet: Self-supervised representation learning through predicting geometric transformations

论文作者

Hashim, Sayed, Ali, Muhammad

论文摘要

深度神经网络需要大量的培训数据，而在现实世界中，有大量的数据可用于培训。为了解决此问题，无监督的方法用于培训有限的数据。在本报告中，我们描述了无监督的语义特征学习方法，以识别应用于输入数据的几何转换。我们方法的基本概念是，如果某人不知道图像中的对象，他/她将无法定量预测应用于它们的几何变换。该自我监督方案基于借口任务和下游任务。量化几何变换的借口分类任务应迫使CNN学习对图像分类有用的对象的高级显着特征。在基线模型中，我们将图像旋转定义为90度的倍数。接受此借口任务的CNN将用于CIFAR-10数据集中的图像作为下游任务。我们使用各种模型运行基线方法，包括Resnet，Densenet，VGG-16和NIN，并在功能提取和微调设置中进行了不同数量的旋转。在扩展该基线模型时，我们在借口任务中尝试了除旋转以外的其他转换。我们比较了在各种设置中所选模型的性能与应用于图像的不同转换，各种数据增强技术以及使用不同的优化器。当应用于分类的下游任务时，这一系列不同类型的实验将有助于我们证明我们的自我监管模型的识别精度。

Deep neural networks need a big amount of training data, while in the real world there is a scarcity of data available for training purposes. To resolve this issue unsupervised methods are used for training with limited data. In this report, we describe the unsupervised semantic feature learning approach for recognition of the geometric transformation applied to the input data. The basic concept of our approach is that if someone is unaware of the objects in the images, he/she would not be able to quantitatively predict the geometric transformation that was applied to them. This self supervised scheme is based on pretext task and the downstream task. The pretext classification task to quantify the geometric transformations should force the CNN to learn high-level salient features of objects useful for image classification. In our baseline model, we define image rotations by multiples of 90 degrees. The CNN trained on this pretext task will be used for the classification of images in the CIFAR-10 dataset as a downstream task. we run the baseline method using various models, including ResNet, DenseNet, VGG-16, and NIN with a varied number of rotations in feature extracting and fine-tuning settings. In extension of this baseline model we experiment with transformations other than rotation in pretext task. We compare performance of selected models in various settings with different transformations applied to images,various data augmentation techniques as well as using different optimizers. This series of different type of experiments will help us demonstrate the recognition accuracy of our self-supervised model when applied to a downstream task of classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题