论文标题

ACCSS3D:空间稀疏3D DNN的加速器

AccSS3D: Accelerator for Spatially Sparse 3D DNNs

论文作者

Omer, Om Ji, Laddha, Prashant, Kalsi, Gurpreet S, Thyagharajan, Anirud, Pillai, Kamlesh R, Kulkarni, Abhimanyu, Yao, Anbang, Chen, Yurong, Subramoney, Sreenivas

论文摘要

现实世界场景的语义理解和完成是3D视觉感知的基础原始,广泛用于机器人技术,医学成像,自动驾驶和导航等高级应用中。由于维数的诅咒,对3D场景理解的计算和记忆要求随着体素分辨率而增长,从而构成了巨大的障碍。由于空间的空间,在3D世界中存在的固有空间稀疏性与已广泛研究的频道稀疏性根本不同。我们提出了用于空间稀疏3D DNN(ACCSS3D)的加速器,这是第一个通过利用足够的空间稀疏性来加速3D场景理解的端到端解决方案。作为一种专门用于空间3D场景的算法 - 数据集合结构的共同设计的系统,ACCSS3D包括新型的空间局部感知元数据结构,接近零的延迟和空间稀疏性稀疏性和空间意识吸引的数据流量优化,表面方向的spatientient pocteratient for spatientient for spat for ACCESENTER ACCESENTER ACCESENTER ACCESENTER ACOPESENTER ACCESENTERIDERIDERIDERIDERIDERIDERIDERITY ACCESENTER ACOPESINETER ACCESENTER ACOPESENTER ACODERESEDERATE通过收缩和多播互连利用数据重用。 SSPNNA加速器芯与64 kb的L1存储器一起在1 GHz时在16nm过程中需要0.92 mm2 mm2。总体而言,与Intel-I7-8700K 4核CPU相比,ACCSS3D可实现3D稀疏卷积的16.8倍加速度和2232X的能效提高,该卷积4核CPU,这转化为11.8倍的端到端3D语义分段速度和24.8倍的能源效率提高(ISO Technology Node)(ISO Technology Node)(ISO Technology Node)24.8倍(ISO)。

Semantic understanding and completion of real world scenes is a foundational primitive of 3D Visual perception widely used in high-level applications such as robotics, medical imaging, autonomous driving and navigation. Due to the curse of dimensionality, compute and memory requirements for 3D scene understanding grow in cubic complexity with voxel resolution, posing a huge impediment to realizing real-time energy efficient deployments. The inherent spatial sparsity present in the 3D world due to free space is fundamentally different from the channel-wise sparsity that has been extensively studied. We present ACCELERATOR FOR SPATIALLY SPARSE 3D DNNs (AccSS3D), the first end-to-end solution for accelerating 3D scene understanding by exploiting the ample spatial sparsity. As an algorithm-dataflow-architecture co-designed system specialized for spatially-sparse 3D scene understanding, AccSS3D includes novel spatial locality-aware metadata structures, a near-zero latency and spatial sparsity-aware dataflow optimizer, a surface orientation aware pointcloud reordering algorithm and a codesigned hardware accelerator for spatial sparsity that exploits data reuse through systolic and multicast interconnects. The SSpNNA accelerator core together with the 64 KB of L1 memory requires 0.92 mm2 of area in 16nm process at 1 GHz. Overall, AccSS3D achieves 16.8x speedup and a 2232x energy efficiency improvement for 3D sparse convolution compared to an Intel-i7-8700K 4-core CPU, which translates to a 11.8x end-to-end 3D semantic segmentation speedup and a 24.8x energy efficiency improvement (iso technology node)

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源