论文标题
高效的浅学习作为深度学习的替代方案
Efficient shallow learning as an alternative to deep learning
论文作者
论文摘要
复杂分类任务的实现需要培训深度学习(DL)体系结构,这些结构包括数十个甚至数百个卷积和完全连接的隐藏层,这远非人类大脑的现实。根据DL理由,第一个卷积层在以下层中揭示了输入和大规模模式中的局部模式,直到它可靠地表征了一类输入为止。在这里,我们证明,在第一和第二卷积层的深度之间具有固定比率,广义浅LENET体系结构的错误率(仅由五层组成),作为电源定律衰减,在第一卷积层中的过滤器数量。该功率定律的推断表明,广义的LENET可以实现以前使用DL体系结构为CIFAR-10数据库获得的较小错误率。具有类似指数的权力定律也表征了广义VGG-16体系结构。但是,这导致相对于LENET达到给定的错误率所需的操作数量显着增加。这种权力定律现象控制着各种广义的LENET和VGG-16架构,暗示了它的普遍行为,并提出了机器学习体系结构之间的定量层次时间空间复杂性。此外,发现沿卷积层的保护定律是其尺寸时间的平方根,其深度为它们的深度,可以渐近地最大程度地减少错误率。这项研究中证明的有效的浅水学习要求使用各种数据库和架构及其加速实施,使用未来的专用硬件开发进行进一步的定量检查。
The realization of complex classification tasks requires training of deep learning (DL) architectures consisting of tens or even hundreds of convolutional and fully connected hidden layers, which is far from the reality of the human brain. According to the DL rationale, the first convolutional layer reveals localized patterns in the input and large-scale patterns in the following layers, until it reliably characterizes a class of inputs. Here, we demonstrate that with a fixed ratio between the depths of the first and second convolutional layers, the error rates of the generalized shallow LeNet architecture, consisting of only five layers, decay as a power law with the number of filters in the first convolutional layer. The extrapolation of this power law indicates that the generalized LeNet can achieve small error rates that were previously obtained for the CIFAR-10 database using DL architectures. A power law with a similar exponent also characterizes the generalized VGG-16 architecture. However, this results in a significantly increased number of operations required to achieve a given error rate with respect to LeNet. This power law phenomenon governs various generalized LeNet and VGG-16 architectures, hinting at its universal behavior and suggesting a quantitative hierarchical time-space complexity among machine learning architectures. Additionally, the conservation law along the convolutional layers, which is the square-root of their size times their depth, is found to asymptotically minimize error rates. The efficient shallow learning that is demonstrated in this study calls for further quantitative examination using various databases and architectures and its accelerated implementation using future dedicated hardware developments.