论文标题
通过监督元学习发现的网络体系结构堆叠无监督的学习
Stacked unsupervised learning with a network architecture found by supervised meta-learning
论文作者
论文摘要
堆叠的无监督学习(SUL)似乎比返回传播更具生物学上的合理性,因为学习是每一层的本地化。但是,在实际应用中,SUL远远远远远远远远没有使SUL可以解释大脑学习的想法。在这里,我们显示了一种SUL算法,该算法可以完全无法根据反向传播执行与无监督的算法相对于无监督算法的完全无监督聚类。我们的算法仅通过需要通过几何扭曲来训练数据增强的自我监督方法来超越。网络体系结构中隐含的无监督算法中唯一的先验知识。多卷积的“能量层”包含一个平方的非线性,灵感来自原发性视觉皮层的“能量模型”。通过快速的K-Subspaces算法实现了卷积内核。高精度需要使用初始美白层进行预处理,在推理期间比学习少的表示以及重新恢复增益控制。通过监督的元学习可以找到网络体系结构的超参数,该学习优化了无监督的聚类精度。我们认为,无监督的学习对网络体系结构中隐含的先验知识的这种依赖性在生物学上是合理的,并且类似于大脑体系结构对进化史的依赖。
Stacked unsupervised learning (SUL) seems more biologically plausible than backpropagation, because learning is local to each layer. But SUL has fallen far short of backpropagation in practical applications, undermining the idea that SUL can explain how brains learn. Here we show an SUL algorithm that can perform completely unsupervised clustering of MNIST digits with comparable accuracy relative to unsupervised algorithms based on backpropagation. Our algorithm is exceeded only by self-supervised methods requiring training data augmentation by geometric distortions. The only prior knowledge in our unsupervised algorithm is implicit in the network architecture. Multiple convolutional "energy layers" contain a sum-of-squares nonlinearity, inspired by "energy models" of primary visual cortex. Convolutional kernels are learned with a fast minibatch implementation of the K-Subspaces algorithm. High accuracy requires preprocessing with an initial whitening layer, representations that are less sparse during inference than learning, and rescaling for gain control. The hyperparameters of the network architecture are found by supervised meta-learning, which optimizes unsupervised clustering accuracy. We regard such dependence of unsupervised learning on prior knowledge implicit in network architecture as biologically plausible, and analogous to the dependence of brain architecture on evolutionary history.