蒙版自动编码器是强大的数据增强器

论文标题

蒙版自动编码器是强大的数据增强器

Masked Autoencoders are Robust Data Augmentors

论文作者

Xu, Haohang, Ding, Shuangrui, Zhao, Manqi, Jiang, Dongsheng

论文摘要

深度神经网络能够学习强大的表示以应对复杂的视力任务，但揭示了不合适的问题等不良属性。为此，深层神经网络等正规化技术是必要的，才能很好地概括。然而，大多数普遍的图像增强食谱都将自己局限于现成的线性转换，例如刻度，翻转和颜色jitter。由于其手工制作的财产，这些增强不足以产生真正的硬增强示例。在本文中，我们提出了一种新颖的增强视角，以使培训过程正常。受到将蒙版图像建模应用于自我监督学习的成功的启发，我们采用了自我监管的蒙版自动编码器来生成输入图像的扭曲视图。我们表明，利用这种基于模型的非线性转换作为数据增强可以改善高级识别任务。我们将提出的方法称为\ textbf {m} ask- \ textbf {r} econstruct \ textbf {a} u gengementation（mra）。各种图像分类基准的广泛实验验证了提出的增强的有效性。具体而言，MRA始终提高监督，半监督和少量分类的性能。

Deep neural networks are capable of learning powerful representations to tackle complex vision tasks but expose undesirable properties like the over-fitting issue. To this end, regularization techniques like image augmentation are necessary for deep neural networks to generalize well. Nevertheless, most prevalent image augmentation recipes confine themselves to off-the-shelf linear transformations like scale, flip, and colorjitter. Due to their hand-crafted property, these augmentations are insufficient to generate truly hard augmented examples. In this paper, we propose a novel perspective of augmentation to regularize the training process. Inspired by the recent success of applying masked image modeling to self-supervised learning, we adopt the self-supervised masked autoencoder to generate the distorted view of the input images. We show that utilizing such model-based nonlinear transformation as data augmentation can improve high-level recognition tasks. We term the proposed method as \textbf{M}ask-\textbf{R}econstruct \textbf{A}ugmentation (MRA). The extensive experiments on various image classification benchmarks verify the effectiveness of the proposed augmentation. Specifically, MRA consistently enhances the performance on supervised, semi-supervised as well as few-shot classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题