深神经网络加速器中制造故障的高级建模

论文标题

深神经网络加速器中制造故障的高级建模

High-level Modeling of Manufacturing Faults in Deep Neural Network Accelerators

论文作者

Kundu, Shamik, Soyyiğit, Ahmet, Hoque, Khaza Anuarul, Basu, Kanad

论文摘要

数据驱动的实时应用程序的出现需要在机器学习加速器上实施深神经网络（DNN）。 Google的张量处理单元（TPU）就是这样的神经网络加速器，它使用基于收缩期数组的矩阵乘法硬件进行计算。矩阵乘法单元的任何状态元素的制造故障可能会导致这些推理网络中的意外错误。在本文中，我们建议使用离散时间马尔可夫链（DTMC）形式主义的永久性故障模型及其在TPU中的传播模型。使用概率模型检查技术对所提出的模型进行分析，以理解输出故障的可能性。获得的定量结果表明，分类精度对永久性故障的类型以及其位置，位置和神经网络中的层数敏感。我们的理论模型的结论已通过基于数字识别的DNN的实验得到了验证。

The advent of data-driven real-time applications requires the implementation of Deep Neural Networks (DNNs) on Machine Learning accelerators. Google's Tensor Processing Unit (TPU) is one such neural network accelerator that uses systolic array-based matrix multiplication hardware for computation in its crux. Manufacturing faults at any state element of the matrix multiplication unit can cause unexpected errors in these inference networks. In this paper, we propose a formal model of permanent faults and their propagation in a TPU using the Discrete-Time Markov Chain (DTMC) formalism. The proposed model is analyzed using the probabilistic model checking technique to reason about the likelihood of faulty outputs. The obtained quantitative results show that the classification accuracy is sensitive to the type of permanent faults as well as their location, bit position and the number of layers in the neural network. The conclusions from our theoretical model have been validated using experiments on a digit recognition-based DNN.

下载PDF全文

下载文献需遵守相关版权规定

论文标题