混合何时促进学习表示的局部线性？

论文标题

混合何时促进学习表示的局部线性？

When does mixup promote local linearity in learned representations?

论文作者

Chaudhry, Arslan, Menon, Aditya Krishna, Veit, Andreas, Jayasumana, Sadeep, Ramalingam, Srikumar, Kumar, Sanjiv

论文摘要

混音是一种正则化技术，它使用原始训练点的凸组合人为地生产新样品。这种简单的技术表现出了强烈的经验表现，并已被大量用作半监督的学习技术的一部分，例如MixMatch〜 \ citep {Berthelot2019mixmatch}和插值一致的培训（ICT）〜\ citep {verma2019interpolation}。在本文中，我们通过半监视的学习设置中的\ emph {表示学习}镜头来查看混音。特别是，我们研究了混合在促进线性网络表示中线性中的作用。在此方面，我们研究了两个问题：（1）在\ emph {last}网络层中强制实施线性的混合损失如何传播到\ emph {emph {prorage}层的线性？（2）在两个以上的数据点上对更强大的混合损失的执行如何影响培训的收敛？我们从经验上研究了混合在视觉数据集（例如CIFAR-10，CIFAR-100和SVHN）上的这些特性。我们的结果表明，有监督的混合训练不会使网络层线性\ emph {ash all}。实际上，与经过训练的\ emph {无}混合的网络相比，在混合训练期间，\ emph {中间层}变得更加非线性。但是，当混合用作无监督的损失时，我们观察到所有网络层变得更加线性，从而导致更快的训练收敛。

Mixup is a regularization technique that artificially produces new samples using convex combinations of original training points. This simple technique has shown strong empirical performance, and has been heavily used as part of semi-supervised learning techniques such as mixmatch~\citep{berthelot2019mixmatch} and interpolation consistent training (ICT)~\citep{verma2019interpolation}. In this paper, we look at Mixup through a \emph{representation learning} lens in a semi-supervised learning setup. In particular, we study the role of Mixup in promoting linearity in the learned network representations. Towards this, we study two questions: (1) how does the Mixup loss that enforces linearity in the \emph{last} network layer propagate the linearity to the \emph{earlier} layers?; and (2) how does the enforcement of stronger Mixup loss on more than two data points affect the convergence of training? We empirically investigate these properties of Mixup on vision datasets such as CIFAR-10, CIFAR-100 and SVHN. Our results show that supervised Mixup training does not make \emph{all} the network layers linear; in fact the \emph{intermediate layers} become more non-linear during Mixup training compared to a network that is trained \emph{without} Mixup. However, when Mixup is used as an unsupervised loss, we observe that all the network layers become more linear resulting in faster training convergence.

下载PDF全文

下载文献需遵守相关版权规定

论文标题