论文标题
具有动态模式分解的人工神经网络中的加速培训
Accelerating Training in Artificial Neural Networks with Dynamic Mode Decomposition
论文作者
论文摘要
深度神经网络(DNN)的培训经常涉及优化数百万甚至数十亿个参数。即使有了现代计算体系结构,DNN培训的计算费用也可以抑制网络体系结构设计优化,超参数研究以及整合到科学研究周期中。关键因素限制性能是在更新规则中优化期间,每个重量都需要每个重量的馈送评估和后传播规则。在这项工作中,我们提出了一种将每个重量下更新规则评估的方法。首先,使用适当的正交分解(POD)来确定训练过程中每层重量演变的主要方向的当前估计,并基于几个反向传播步骤观察到的演变。然后,根据这些主说明,动态模式分解(DMD)用于学习每层重量演变的动力学。训练ANN时,DMD模型用于评估近似融合状态。之后,从DMD估计开始,执行了一些重新传播步骤,从而导致对主说明和DMD模型的更新。重复此迭代过程直至收敛。通过微调每个DMD模型估计使用的返回步骤的数量,可以大大减少训练神经网络所需的操作数量。在本文中,将详细解释DMD加速方法以及DMD提供的加速度的理论理由。使用科学机器学习社区的关键兴趣回归问题来说明这种方法:在扩散,对流,反应问题中对污染物浓度领域的预测。
Training of deep neural networks (DNNs) frequently involves optimizing several millions or even billions of parameters. Even with modern computing architectures, the computational expense of DNN training can inhibit, for instance, network architecture design optimization, hyper-parameter studies, and integration into scientific research cycles. The key factor limiting performance is that both the feed-forward evaluation and the back-propagation rule are needed for each weight during optimization in the update rule. In this work, we propose a method to decouple the evaluation of the update rule at each weight. At first, Proper Orthogonal Decomposition (POD) is used to identify a current estimate of the principal directions of evolution of weights per layer during training based on the evolution observed with a few backpropagation steps. Then, Dynamic Mode Decomposition (DMD) is used to learn the dynamics of the evolution of the weights in each layer according to these principal directions. The DMD model is used to evaluate an approximate converged state when training the ANN. Afterward, some number of backpropagation steps are performed, starting from the DMD estimates, leading to an update to the principal directions and DMD model. This iterative process is repeated until convergence. By fine-tuning the number of backpropagation steps used for each DMD model estimation, a significant reduction in the number of operations required to train the neural networks can be achieved. In this paper, the DMD acceleration method will be explained in detail, along with the theoretical justification for the acceleration provided by DMD. This method is illustrated using a regression problem of key interest for the scientific machine learning community: the prediction of a pollutant concentration field in a diffusion, advection, reaction problem.