论文标题
稀疏深度神经网络体系结构的适应性和稳定性促进层训练方法
An Adaptive and Stability-Promoting Layerwise Training Approach for Sparse Deep Neural Network Architecture
论文作者
论文摘要
这项工作提出了一个两阶段的自适应框架,用于逐步开发深层神经网络(DNN)体系结构,可以很好地推广给给定的培训数据集。在第一阶段,采用了层训练方法,每次添加新图层并通过冻结上层中的冻结参数独立训练。我们通过使用歧管正则化,稀疏性正则化和物理信息术语来对DNN施加理想的结构。我们将Epsilon-Delta稳定性概念引入了学习算法的理想特性,并表明采用歧管正则化会产生Epsilon-Delta稳定性促进算法。此外,我们还得出了新添加层的训练性并研究训练饱和问题的必要条件。在算法的第二阶段(后处理),采用了一系列浅网络来从第一阶段产生的残差中提取信息,从而提高了预测准确性。关于原型回归和分类问题的数值研究表明,所提出的方法可以超过相同大小的完全连接的DNN。此外,通过将物理知识的神经网络(PINN)提供提出的自适应体系结构策略来求解部分微分方程,我们从数值上表明自适应PINN不仅比标准PINN优于标准PINN,而且还产生具有可证明稳定性的可解释的隐藏层。我们还将我们的体系结构设计策略应用于椭圆形偏微分方程控制的反问题。
This work presents a two-stage adaptive framework for progressively developing deep neural network (DNN) architectures that generalize well for a given training data set. In the first stage, a layerwise training approach is adopted where a new layer is added each time and trained independently by freezing parameters in the previous layers. We impose desirable structures on the DNN by employing manifold regularization, sparsity regularization, and physics-informed terms. We introduce a epsilon-delta stability-promoting concept as a desirable property for a learning algorithm and show that employing manifold regularization yields a epsilon-delta stability-promoting algorithm. Further, we also derive the necessary conditions for the trainability of a newly added layer and investigate the training saturation problem. In the second stage of the algorithm (post-processing), a sequence of shallow networks is employed to extract information from the residual produced in the first stage, thereby improving the prediction accuracy. Numerical investigations on prototype regression and classification problems demonstrate that the proposed approach can outperform fully connected DNNs of the same size. Moreover, by equipping the physics-informed neural network (PINN) with the proposed adaptive architecture strategy to solve partial differential equations, we numerically show that adaptive PINNs not only are superior to standard PINNs but also produce interpretable hidden layers with provable stability. We also apply our architecture design strategy to solve inverse problems governed by elliptic partial differential equations.