论文标题
训练有素的神经网络中分布外检测的性能分析
Performance Analysis of Out-of-Distribution Detection on Trained Neural Networks
论文作者
论文摘要
在过去的几年中,通过深度学习得到了改善。在过去的几年中,实施与非安全相关应用程序的深度神经网络(DNN)表现出了显着的成就;但是,对于在安全关键应用中使用DNN,我们缺少验证此类模型鲁棒性的方法。当暴露于DNN范围之外的分布样本时,会出现DNN的一个共同挑战,但尽管没有此类投入的先验知识,但仍会导致高置信度输出。 在本文中,我们分析了在四个众所周知的DNN体系结构上分开的三种方法,称为主管。我们发现,模型质量的异常检测性能提高了。我们还通过以预定义的间隔应用主管来调查其培训的绩效,以分析培训过程中特定主管的绩效。我们观察到,了解训练结果与主管绩效之间的关系对于提高模型的鲁棒性至关重要,并指出哪些输入样本需要采取进一步的措施来改善DNN的鲁棒性。此外,我们的工作铺平了通往安全论证的工具的道路,以确保安全临界应用。本文是我们先前在2019年SEAA上发表的作品的扩展版本(参见[1]);在这里,我们详细介绍了使用的指标,添加了一个额外的主管,并在两个附加数据集上对其进行了测试。
Several areas have been improved with Deep Learning during the past years. Implementing Deep Neural Networks (DNN) for non-safety related applications have shown remarkable achievements over the past years; however, for using DNNs in safety critical applications, we are missing approaches for verifying the robustness of such models. A common challenge for DNNs occurs when exposed to out-of-distribution samples that are outside of the scope of a DNN, but which result in high confidence outputs despite no prior knowledge of such input. In this paper, we analyze three methods that separate between in- and out-of-distribution data, called supervisors, on four well-known DNN architectures. We find that the outlier detection performance improves with the quality of the model. We also analyse the performance of the particular supervisors during the training procedure by applying the supervisor at a predefined interval to investigate its performance as the training proceeds. We observe that understanding the relationship between training results and supervisor performance is crucial to improve the model's robustness and to indicate, what input samples require further measures to improve the robustness of a DNN. In addition, our work paves the road towards an instrument for safety argumentation for safety critical applications. This paper is an extended version of our previous work presented at 2019 SEAA (cf. [1]); here, we elaborate on the used metrics, add an additional supervisor and test them on two additional datasets.