McUnet：在物联网设备上的深入学习

论文标题

McUnet：在物联网设备上的深入学习

MCUNet: Tiny Deep Learning on IoT Devices

论文作者

Lin, Ji, Chen, Wei-Ming, Lin, Yujun, Cohn, John, Gan, Chuang, Han, Song

论文摘要

基于微控制器单元（MCU）的微型物联网设备上的机器学习很有吸引力，但具有挑战性：微控制器的记忆甚至比手机小2-3个数量级。我们提出了McUnet，该框架共同设计了有效的神经结构（TinyNAS）和轻巧的推理引擎（TinyEngine），从而实现了对微控制器的Imagenet尺度推断。 Tinynas采用了两阶段的神经体系结构搜索方法，该方法首先优化了搜索空间以适合资源约束，然后在优化的搜索空间中专门将网络体系结构进行了专业。 TinyNA可以自动处理低搜索成本下的各种约束（即设备，潜伏期，能量，内存）。Tinynas与TinyEngine共同设计，TinyNENGINE是一种存储效率推理库，以扩展搜索空间并适合更大的模型。 TinyEngine根据整体网络拓扑而不是层次优化调整内存计划，将内存使用量减少4.8倍，并将推断与TF-Lite Micro和CMSIS-NN相比，将推断提高1.7-3.3倍。与量化的MobilenetV2和Resnet-18相比，MCUNET是第一个在现成的商业微控制器上实现> 70％Imagenet Top1精度> 70％的精度，使用3.5倍和5.7倍。在视觉和音频唤醒单词任务上，McUnet可实现最先进的精度，并且比MobilenetV2和基于proxylessnas的解决方案快2.4-3.4倍，较小的峰值SRAM。我们的研究表明，在物联网设备上始终对机器学习的时代已经到来。代码和模型可以在此处找到：https：//tinyml.mit.edu。

Machine learning on tiny IoT devices based on microcontroller units (MCU) is appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude smaller even than mobile phones. We propose MCUNet, a framework that jointly designs the efficient neural architecture (TinyNAS) and the lightweight inference engine (TinyEngine), enabling ImageNet-scale inference on microcontrollers. TinyNAS adopts a two-stage neural architecture search approach that first optimizes the search space to fit the resource constraints, then specializes the network architecture in the optimized search space. TinyNAS can automatically handle diverse constraints (i.e.device, latency, energy, memory) under low search costs.TinyNAS is co-designed with TinyEngine, a memory-efficient inference library to expand the search space and fit a larger model. TinyEngine adapts the memory scheduling according to the overall network topology rather than layer-wise optimization, reducing the memory usage by 4.8x, and accelerating the inference by 1.7-3.3x compared to TF-Lite Micro and CMSIS-NN. MCUNet is the first to achieves >70% ImageNet top1 accuracy on an off-the-shelf commercial microcontroller, using 3.5x less SRAM and 5.7x less Flash compared to quantized MobileNetV2 and ResNet-18. On visual&audio wake words tasks, MCUNet achieves state-of-the-art accuracy and runs 2.4-3.4x faster than MobileNetV2 and ProxylessNAS-based solutions with 3.7-4.1x smaller peak SRAM. Our study suggests that the era of always-on tiny machine learning on IoT devices has arrived. Code and models can be found here: https://tinyml.mit.edu.

下载PDF全文

下载文献需遵守相关版权规定

论文标题