全球门控深线性网络

论文标题

全球门控深线性网络

Globally Gated Deep Linear Networks

论文作者

Li, Qianyi, Sompolinsky, Haim

论文摘要

最近提出的封闭式线性网络呈现出可拖动的非线性网络体系结构，并具有有趣的功能，例如具有局部错误信号的学习和依次学习中的遗忘减少。在这项工作中，我们介绍了一种新颖的门控架构，称为全球封闭的深层线性网络（GGDLN），其中在每个层中的所有处理单元之间共享门控单元，从而将非线性但未学习的盖种构建的架构与学习的线性处理主题分离。我们在有限宽度的热力学极限中得出了这些网络中概括的属性的精确方程，该限制由$ p，n \ rightarrow \ infty，p/n \ sim o（1）$定义，其中p和n是训练样本大小和网络宽度。我们发现，与GP内核相比，可以通过通过数据依赖性矩阵进行形状重新归一化的核表示网络预测变量的统计数据。我们的理论准确地捕获了有限宽度GGDLN的行为，该行为训练有梯度下降动力学。我们表明，内核形状的重归其化导致了丰富的概括属性W.R.T.网络宽度，深度和L2正则化振幅。有趣的是，具有足够的门控单元的网络与标准Relu网络的行为相似。尽管模型中的毒品没有参与监督学习，但我们显示了对门控参数的无监督学习的实用性。此外，我们的理论允许通过将与任务相关的信息纳入门控单元来评估网络学习多个任务的能力。总而言之，我们的工作是具有有限宽度的非线性网络家族中学习的第一个精确理论解决方案。 GGDLN的丰富而多样化的行为表明，在有限宽度的非线性深层网络中，它们是学习单一任务和多个任务的可分析模型。

Recently proposed Gated Linear Networks present a tractable nonlinear network architecture, and exhibit interesting capabilities such as learning with local error signals and reduced forgetting in sequential learning. In this work, we introduce a novel gating architecture, named Globally Gated Deep Linear Networks (GGDLNs) where gating units are shared among all processing units in each layer, thereby decoupling the architectures of the nonlinear but unlearned gatings and the learned linear processing motifs. We derive exact equations for the generalization properties in these networks in the finite-width thermodynamic limit, defined by $P,N\rightarrow\infty, P/N\sim O(1)$, where P and N are the training sample size and the network width respectively. We find that the statistics of the network predictor can be expressed in terms of kernels that undergo shape renormalization through a data-dependent matrix compared to the GP kernels. Our theory accurately captures the behavior of finite width GGDLNs trained with gradient descent dynamics. We show that kernel shape renormalization gives rise to rich generalization properties w.r.t. network width, depth and L2 regularization amplitude. Interestingly, networks with sufficient gating units behave similarly to standard ReLU networks. Although gatings in the model do not participate in supervised learning, we show the utility of unsupervised learning of the gating parameters. Additionally, our theory allows the evaluation of the network's ability for learning multiple tasks by incorporating task-relevant information into the gating units. In summary, our work is the first exact theoretical solution of learning in a family of nonlinear networks with finite width. The rich and diverse behavior of the GGDLNs suggests that they are helpful analytically tractable models of learning single and multiple tasks, in finite-width nonlinear deep networks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题