论文标题
具有固定蝴蝶结构的稀疏线性网络:理论和实践
Sparse Linear Networks with a Fixed Butterfly Structure: Theory and Practice
论文作者
论文摘要
蝴蝶网络由对数的许多层组成,每个层都有线性数量的非零权重(预先指定)。快速约翰逊 - 林登斯特劳斯变换(FJLT)可以表示为蝴蝶网络,然后将投影放在坐标的随机子集上。此外,基于高概率的FJLT的随机矩阵近似于矢量上任何矩阵的作用。在这些事实的动机上,我们建议通过基于蝴蝶网络的体系结构代替任何神经网络中的密集线性层。在标准密集层中所需的二次权重到几乎线性的二次权重上,所提出的体系结构在产生的操作员的表现上几乎没有折衷。在各种实验的集合中,包括对NLP和视觉数据的监督预测,我们表明,这不仅会产生匹配的结果,而且有时甚至胜过现有众所周知的架构,而且还提供了更快的部署培训和预测。为了了解神经网络与蝴蝶网络所构成的优化问题,我们还研究了编码器折线网络的优化景观,在该景观中,编码器被蝴蝶网络代替,然后是较小维度的密集线性层。本文中提出的理论结果解释了为什么我们提出的方法不会损害训练速度和结果。
A butterfly network consists of logarithmically many layers, each with a linear number of non-zero weights (pre-specified). The fast Johnson-Lindenstrauss transform (FJLT) can be represented as a butterfly network followed by a projection onto a random subset of the coordinates. Moreover, a random matrix based on FJLT with high probability approximates the action of any matrix on a vector. Motivated by these facts, we propose to replace a dense linear layer in any neural network by an architecture based on the butterfly network. The proposed architecture significantly improves upon the quadratic number of weights required in a standard dense layer to nearly linear with little compromise in expressibility of the resulting operator. In a collection of wide variety of experiments, including supervised prediction on both the NLP and vision data, we show that this not only produces results that match and at times outperform existing well-known architectures, but it also offers faster training and prediction in deployment. To understand the optimization problems posed by neural networks with a butterfly network, we also study the optimization landscape of the encoder-decoder network, where the encoder is replaced by a butterfly network followed by a dense linear layer in smaller dimension. Theoretical result presented in the paper explains why the training speed and outcome are not compromised by our proposed approach.