论文标题
测试前馈神经网络培训计划
Testing Feedforward Neural Networks Training Programs
论文作者
论文摘要
如今,我们目睹了越来越多的努力来改善深神经网络(DNNS)的性能和可信度,目的是使他们能够在自动驾驶汽车等安全关键系统中采用。提出了多种测试技术来生成可能在DNN模型行为中暴露不一致的测试用例。这些技术隐含地假设培训程序是无障碍且配置适当的。但是,满足新问题的这一假设需要重大的工程工作来准备数据,设计DNN,实施训练程序并调整超级参数,以便产生当前自动化测试数据生成器搜索角态行为的模型。所有这些模型训练步骤都可能容易出错。因此,在基于DNN的软件系统的所有工程步骤中,不仅在生成的DNN模型上检测和纠正错误的错误至关重要。在本文中,我们收集了培训问题的目录,并基于他们的症状及其对培训计划行为的影响,我们提出了实用的验证例程,以自动验证培训期间学习动态的某些重要特性,从而自动检测上述问题。然后,我们设计了TheDeepChecker,这是一种基于端到端的DNN培训计划的端到端物业调试方法。我们评估了TheDeepchecker对合成和现实世界大战DL程序的有效性,并将其与Amazon Sagemaker调试器(SMD)进行比较。结果表明,TheDeepChecker对基于DNN的程序属性的执行验证成功地揭示了几种编码错误和系统错误配置,并以低成本的早期和低成本。此外,就检测准确性和DL错误覆盖范围而言,TheDeepchecker在训练日志上的表现优于SMD的离线规则验证。
Nowadays, we are witnessing an increasing effort to improve the performance and trustworthiness of Deep Neural Networks (DNNs), with the aim to enable their adoption in safety critical systems such as self-driving cars. Multiple testing techniques are proposed to generate test cases that can expose inconsistencies in the behavior of DNN models. These techniques assume implicitly that the training program is bug-free and appropriately configured. However, satisfying this assumption for a novel problem requires significant engineering work to prepare the data, design the DNN, implement the training program, and tune the hyperparameters in order to produce the model for which current automated test data generators search for corner-case behaviors. All these model training steps can be error-prone. Therefore, it is crucial to detect and correct errors throughout all the engineering steps of DNN-based software systems and not only on the resulting DNN model. In this paper, we gather a catalog of training issues and based on their symptoms and their effects on the behavior of the training program, we propose practical verification routines to detect the aforementioned issues, automatically, by continuously validating that some important properties of the learning dynamics hold during the training. Then, we design, TheDeepChecker, an end-to-end property-based debugging approach for DNN training programs. We assess the effectiveness of TheDeepChecker on synthetic and real-world buggy DL programs and compare it with Amazon SageMaker Debugger (SMD). Results show that TheDeepChecker's on-execution validation of DNN-based program's properties succeeds in revealing several coding bugs and system misconfigurations, early on and at a low cost. Moreover, TheDeepChecker outperforms the SMD's offline rules verification on training logs in terms of detection accuracy and DL bugs coverage.