论文标题
XR应用程序的Edge-ai硬件的面向内存的设计空间探索
Memory-Oriented Design-Space Exploration of Edge-AI Hardware for XR Applications
论文作者
论文摘要
低功率边缘AI功能对于支持元视野的设备扩展现实(XR)应用至关重要。在这项工作中,我们研究了两个代表性的XR工作负载:(i)手工检测和(ii)眼睛分割,用于硬件设计空间探索。对于这两种应用,我们都会训练深层神经网络,并分析量化和硬件特定瓶颈的影响。通过模拟,我们评估了CPU和两个收缩推理加速器实现。接下来,我们将这些硬件解决方案与先进的技术节点进行比较。评估了将最新的新兴非易失性记忆技术(STT/SOT/VGSOT MRAM)集成到XR-AI推论管道中的影响。我们发现,可以通过在满足最小IPS(推理每秒推理)中引入7NM节点的设计,以在7NM节点的设计中引入非挥发性记忆,从而实现手部检测(IPS = 10)和眼部分割(IPS = 0.1)的显着能量益处(IPS = 0.1)。此外,由于MRAM与传统的SRAM相比,由于MRAM的较小形式,我们可以大大减少面积(> = 30%)。
Low-Power Edge-AI capabilities are essential for on-device extended reality (XR) applications to support the vision of Metaverse. In this work, we investigate two representative XR workloads: (i) Hand detection and (ii) Eye segmentation, for hardware design space exploration. For both applications, we train deep neural networks and analyze the impact of quantization and hardware specific bottlenecks. Through simulations, we evaluate a CPU and two systolic inference accelerator implementations. Next, we compare these hardware solutions with advanced technology nodes. The impact of integrating state-of-the-art emerging non-volatile memory technology (STT/SOT/VGSOT MRAM) into the XR-AI inference pipeline is evaluated. We found that significant energy benefits (>=24%) can be achieved for hand detection (IPS=10) and eye segmentation (IPS=0.1) by introducing non-volatile memory in the memory hierarchy for designs at 7nm node while meeting minimum IPS (inference per second). Moreover, we can realize substantial reduction in area (>=30%) owing to the small form factor of MRAM compared to traditional SRAM.