论文标题

Tabula:有效计算安全神经网络推断的非线性激活功能

Tabula: Efficiently Computing Nonlinear Activation Functions for Secure Neural Network Inference

论文作者

Lam, Maximilian, Mitzenmacher, Michael, Reddi, Vijay Janapa, Wei, Gu-Yeon, Brooks, David

论文摘要

确保神经网络推断的多阶数计算方法通常依赖乱码的电路来牢固执行非线性激活功能。但是,乱码的电路需要服务器和客户端之间的过度通信,施加大量的存储开销,并遭受较大的运行时罚款。为了降低这些成本,我们提出了乱码电路的替代方法:Tabula,一种基于安全查找表的算法。我们的方法在离线阶段的查找表中包含所有可能的非线性函数调用的结果。由于这些表在操作数的数量和输入值的精确度中产生了指数存储成本,因此我们使用量化来降低这些存储成本以使这种方法实用。这使一个在线阶段可以安全地计算非线性功能的结果仅需要一轮通信,而通信成本等于非线性函数输入的位置的两倍。实际上,我们的方法在在线阶段每个非线性函数呼叫的通信为2个字节。与带有8位量化输入的乱码电路相比,当计算在线阶段的单个非线性功能时,实验显示了带有8位激活的Tabula使用的使用$ 280 $ -560 $ -560 \ $ 560 \ tims $少$少于$ 100 \ tims $ \ tims $ $ \ times $越来越快,使用了可比的(在2倍的2倍);与其他最先进的协议相比,Tabula的实现大于$ 40 \ times $降低。这会导致在在线的神经网络安全推理的在线阶段进行量化输入的乱式电路的表现显着提高:Tabula将端到端推理通信降低了高达$ 9 \ times $ $,并达到了端到端推理的速度$ 50 \ times $ $ 50 \ times $,同时降低了可比的存储和离线预处理成本。

Multiparty computation approaches to secure neural network inference commonly rely on garbled circuits for securely executing nonlinear activation functions. However, garbled circuits require excessive communication between server and client, impose significant storage overheads, and incur large runtime penalties. To reduce these costs, we propose an alternative to garbled circuits: Tabula, an algorithm based on secure lookup tables. Our approach precomputes lookup tables during an offline phase that contains the result of all possible nonlinear function calls. Because these tables incur exponential storage costs in the number of operands and the precision of the input values, we use quantization to reduce these storage costs to make this approach practical. This enables an online phase where securely computing the result of a nonlinear function requires just a single round of communication, with communication cost equal to twice the number of bits of the input to the nonlinear function. In practice our approach costs 2 bytes of communication per nonlinear function call in the online phase. Compared to garbled circuits with 8-bit quantized inputs, when computing individual nonlinear functions during the online phase, experiments show Tabula with 8-bit activations uses between $280$-$560 \times$ less communication, is over $100\times$ faster, and uses a comparable (within a factor of 2) amount of storage; compared against other state-of-the-art protocols Tabula achieves greater than $40\times$ communication reduction. This leads to significant performance gains over garbled circuits with quantized inputs during the online phase of secure inference of neural networks: Tabula reduces end-to-end inference communication by up to $9 \times$ and achieves an end-to-end inference speedup of up to $50 \times$, while imposing comparable storage and offline preprocessing costs.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源