脂肪：训练神经网络在硬件故障下可靠推断

论文标题

脂肪：训练神经网络在硬件故障下可靠推断

FAT: Training Neural Networks for Reliable Inference Under Hardware Faults

论文作者

Zahid, Ussama, Gambardella, Giulio, Fraser, Nicholas J., Blott, Michaela, Vissers, Kees

论文摘要

深神经网络（DNNS）是用于多种应用程序的最新算法，从图像分类到语音识别。在提供出色的准确性的同时，它们通常具有巨大的计算和内存要求。因此，量化的神经网络（QNN）越来越多地被采用并特别在嵌入式设备上部署，这要归功于它们的高精度，但也因为与浮点等效物相比，它们的计算和内存需求明显降低。还将评估QNN部署的安全至关重要应用，例如汽车，航空电子，医疗或工业。这些系统需要功能安全性，即使在存在硬件故障的情况下，也可以保证无故障行为。总的来说，可以通过向系统增加冗余来实现容忍度，这进一步加剧了整体计算需求，并且很难满足功率和性能要求。为了降低实现功能安全性的硬件成本，探索特定领域的解决方案至关重要，该解决方案可以利用DNN的固有功能。在这项工作中，我们提出了一种称为故障训练（FAT）的新方法，其中包括在神经网络（NN）训练过程中进行错误建模，以使QNNS弹性地抗设备上的特定故障模型。我们的实验表明，通过在训练期间注入卷积层中的断层，可以训练高度准确的卷积神经网络（CNN），与原始相比，它表现出更好的误差。此外，我们表明，由接受脂肪训练的QNN构建的冗余系统在较低的硬件成本下实现了较高的较差的准确性。对于包括CIFAR10，GTSRB，SVHN和Imagenet在内的众多分类任务，已验证了这一点。

Deep neural networks (DNNs) are state-of-the-art algorithms for multiple applications, spanning from image classification to speech recognition. While providing excellent accuracy, they often have enormous compute and memory requirements. As a result of this, quantized neural networks (QNNs) are increasingly being adopted and deployed especially on embedded devices, thanks to their high accuracy, but also since they have significantly lower compute and memory requirements compared to their floating point equivalents. QNN deployment is also being evaluated for safety-critical applications, such as automotive, avionics, medical or industrial. These systems require functional safety, guaranteeing failure-free behaviour even in the presence of hardware faults. In general fault tolerance can be achieved by adding redundancy to the system, which further exacerbates the overall computational demands and makes it difficult to meet the power and performance requirements. In order to decrease the hardware cost for achieving functional safety, it is vital to explore domain-specific solutions which can exploit the inherent features of DNNs. In this work we present a novel methodology called fault-aware training (FAT), which includes error modeling during neural network (NN) training, to make QNNs resilient to specific fault models on the device. Our experiments show that by injecting faults in the convolutional layers during training, highly accurate convolutional neural networks (CNNs) can be trained which exhibits much better error tolerance compared to the original. Furthermore, we show that redundant systems which are built from QNNs trained with FAT achieve higher worse-case accuracy at lower hardware cost. This has been validated for numerous classification tasks including CIFAR10, GTSRB, SVHN and ImageNet.

下载PDF全文

下载文献需遵守相关版权规定

论文标题