论文标题
嵌入式系统上有效的机器学习的汇编和优化
Compilation and Optimizations for Efficient Machine Learning on Embedded Systems
论文作者
论文摘要
深度神经网络(DNNS)在各种机器学习(ML)应用中取得了巨大的成功,在计算机视觉,自然语言处理和虚拟现实等方面提供了高质量的推理解决方案。但是,基于DNN的ML应用程序还可以增加计算和存储需求,这对具有有限的计算/储存资源的嵌入式系统尤其具有挑战性。挑战还来自各种特定应用的要求,包括实时响应,高通量性能和可靠的推理准确性。为了应对这些挑战,我们介绍了一系列有效的设计方法,包括有效的ML模型设计,定制的硬件加速器设计以及硬件/软件共同设计策略,以启用嵌入式系统上有效的ML应用程序。
Deep Neural Networks (DNNs) have achieved great success in a variety of machine learning (ML) applications, delivering high-quality inferencing solutions in computer vision, natural language processing, and virtual reality, etc. However, DNN-based ML applications also bring much increased computational and storage requirements, which are particularly challenging for embedded systems with limited compute/storage resources, tight power budgets, and small form factors. Challenges also come from the diverse application-specific requirements, including real-time responses, high-throughput performance, and reliable inference accuracy. To address these challenges, we introduce a series of effective design methodologies, including efficient ML model designs, customized hardware accelerator designs, and hardware/software co-design strategies to enable efficient ML applications on embedded systems.