论文标题
深度神经网推理的有效记忆管理
Efficient Memory Management for Deep Neural Net Inference
论文作者
论文摘要
虽然深度神经网的推理仅被认为是服务器的任务,但技术的最新进展允许推理的任务移至移动设备和嵌入式设备,这是由于从延迟到隐私性的各种原因所希望的。这些设备不仅受其计算功率和电池的限制,而且还受其劣质物理内存和缓存的限制,因此,有效的内存管理器成为边缘深神经网推断的关键组成部分。我们探索各种策略,以在深神经网中的中间张量中巧妙地共享内存缓冲。使用这些可能会导致比艺术状态高达11%的记忆足迹。
While deep neural net inference was considered a task for servers only, latest advances in technology allow the task of inference to be moved to mobile and embedded devices, desired for various reasons ranging from latency to privacy. These devices are not only limited by their compute power and battery, but also by their inferior physical memory and cache, and thus, an efficient memory manager becomes a crucial component for deep neural net inference at the edge. We explore various strategies to smartly share memory buffers among intermediate tensors in deep neural nets. Employing these can result in up to 11% smaller memory footprint than the state of the art.