论文标题
结构化的稀疏神经网络及其基质计算算法
A Structured Sparse Neural Network and Its Matrix Calculations Algorithm
论文作者
论文摘要
梯度下降优化和反向传播是训练神经网络的最常见方法,但是对于实时应用程序而言,它们在计算上很昂贵,需要高内存资源,并且对于许多网络和大型数据集来说很难收敛。 [伪]培训神经网络的逆模型已成为克服这些问题的强大工具。为了有效地实施这些方法,可以应用结构化修剪来产生稀疏的神经网络。尽管稀疏的神经网络在记忆使用方面有效,但它们的大多数算法都使用相同的满载矩阵计算方法,这些方法对于稀疏矩阵不有效。 Tridia -Gonal矩阵是用于构建神经网络的常用候选者之一,但它们的灵活性不足以处理不足和过度拟合的问题以及概括。在本文中,我们引入了一种非对称的三角矩阵,具有稀疏稀疏条目,偏移子和超级对角线以及其[伪]逆计算和决定性计算的算法。用于矩阵计算的传统算法,特别是这些形式的反转和决定因素,对于大型矩阵而言,这些形式不是有效的,例如较大的数据集或更深的网络。开发了下三角矩阵的分解,将原始矩阵分解为一组矩阵,其中计算其反向矩阵。对于不存在矩阵逆的情况,提供了至少正方形的伪式。当前的方法是直接例程,即在可预测的操作数量中执行,该操作已针对大小变化的随机生成的矩阵进行了测试。当矩阵的大小增加时,该结果表明计算成本的显着改善。
Gradient descent optimizations and backpropagation are the most common methods for training neural networks, but they are computationally expensive for real time applications, need high memory resources, and are difficult to converge for many networks and large datasets. [Pseudo]inverse models for training neural network have emerged as powerful tools to overcome these issues. In order to effectively implement these methods, structured pruning maybe be applied to produce sparse neural networks. Although sparse neural networks are efficient in memory usage, most of their algorithms use the same fully loaded matrix calculation methods which are not efficient for sparse matrices. Tridiagonal matrices are one of the frequently used candidates for structuring neural networks, but they are not flexible enough to handle underfitting and overfitting problems as well as generalization properties. In this paper, we introduce a nonsymmetric, tridiagonal matrix with offdiagonal sparse entries and offset sub and super-diagonals as well algorithms for its [pseudo]inverse and determinant calculations. Traditional algorithms for matrix calculations, specifically inversion and determinant, of these forms are not efficient specially for large matrices, e.g. larger datasets or deeper networks. A decomposition for lower triangular matrices is developed and the original matrix is factorized into a set of matrices where their inverse matrices are calculated. For the cases where the matrix inverse does not exist, a least square type pseudoinverse is provided. The present method is a direct routine, i.e., executes in a predictable number of operations which is tested for randomly generated matrices with varying size. The results show significant improvement in computational costs specially when the size of matrix increases.