MIME：调整单个神经网络以使用记忆效率的动态修剪来进行多任务推断

论文标题

MIME：调整单个神经网络以使用记忆效率的动态修剪来进行多任务推断

MIME: Adapting a Single Neural Network for Multi-task Inference with Memory-efficient Dynamic Pruning

论文作者

Bhattacharjee, Abhiroop, Venkatesha, Yeshwanth, Moitra, Abhishek, Panda, Priyadarshini

论文摘要

近年来，范式转向了多任务学习。这需要在多任务场景中进行记忆和节能解决方案。我们提出了一种称为MIME的算法 - 硬件共同设计方法。 MIME重复了经过训练的父任务的重量参数，并学习了特定任务的阈值参数以推断多个子任务。我们发现，与常规的多任务推论相比，MIME导致了多个任务的神经网络参数的高度记忆有效的DRAM存储。此外，MIME会导致输入依赖性动态神经元修剪，从而在收缩 - 阵列硬件上具有较高吞吐量的节能推断。我们使用基准数据集（儿童任务）-CIFAR10，CIFAR100和时尚摄影师进行的实验表明，与传统的多任务指定相比，MIME实现了〜3.48x的内存效率，并且〜2.4-3.1x的能量保存〜2.4-3.1x。

Recent years have seen a paradigm shift towards multi-task learning. This calls for memory and energy-efficient solutions for inference in a multi-task scenario. We propose an algorithm-hardware co-design approach called MIME. MIME reuses the weight parameters of a trained parent task and learns task-specific threshold parameters for inference on multiple child tasks. We find that MIME results in highly memory-efficient DRAM storage of neural-network parameters for multiple tasks compared to conventional multi-task inference. In addition, MIME results in input-dependent dynamic neuronal pruning, thereby enabling energy-efficient inference with higher throughput on a systolic-array hardware. Our experiments with benchmark datasets (child tasks)- CIFAR10, CIFAR100, and Fashion-MNIST, show that MIME achieves ~3.48x memory-efficiency and ~2.4-3.1x energy-savings compared to conventional multi-task inference in Pipelined task mode.

下载PDF全文

下载文献需遵守相关版权规定

论文标题