论文标题

OO-VR:基于NUMA的多GPU系统的Numa友好面向对象的VR渲染框架

OO-VR: NUMA Friendly Object-Oriented VR Rendering Framework For Future NUMA-Based Multi-GPU Systems

论文作者

Xie, Chenhao, Fu, Xin, Chen, Mingsong, Song, Shuaiwen Leon

论文摘要

凭借强大的计算能力,基于NUMA的多GPU系统是为虚拟现实提供可持续可扩展性能的有前途的候选人。但是,整个多GPU系统被视为单个GPU,它忽略了工作负载分布期间VR渲染中的数据局部性,从而导致GPU模型之间的巨大远程内存访问。通过对不同类型的并行渲染框架进行全面的特征,我们观察到,分布渲染对象及其每GPM所需的数据可以减少GPM间内存访问。但是,这种对象级渲染仍然面临基于NUMA的多GPU系统的两个主要挑战:(1)同一对象的左右视图之间的较大数据位置以及不同对象之间的数据共享以及(2)(2)由软件级别分布和组成机制引起的不平衡工作负载。为了应对这些挑战,我们提出了面向对象的VR渲染框架(OO-VR),该框架(OO-VR)进行了软件和硬件合作式化,以为基于NUMA的Multi-GPU系统中的VR多视图渲染提供Numa友好的解决方案。我们首先提出了一个面向对象的VR编程模型,以根据其纹理共享级别利用同一对象的两个视图和组对象之间的数据共享分批。然后,我们设计了一个对象意识到的运行时批处理分配引擎和分布式硬件组成单元,以在GPMS之间实现平衡的工作负载。最后,我们对VR的模拟器的评估显示,OO-VR提供了1.58倍的总体性能改进,并且对最先进的多GPU系统提供了76%的GPM间内存流量减少。此外,OO-VR为未来的较大的多GPU方案提供了数字友好的性能可伸缩性,而本地和远程内存之间的不对称带宽越来越不对称。

With the strong computation capability, NUMA-based multi-GPU system is a promising candidate to provide sustainable and scalable performance for Virtual Reality. However, the entire multi-GPU system is viewed as a single GPU which ignores the data locality in VR rendering during the workload distribution, leading to tremendous remote memory accesses among GPU models. By conducting comprehensive characterizations on different kinds of parallel rendering frameworks, we observe that distributing the rendering object along with its required data per GPM can reduce the inter-GPM memory accesses. However, this object-level rendering still faces two major challenges in NUMA-based multi-GPU system: (1) the large data locality between the left and right views of the same object and the data sharing among different objects and (2) the unbalanced workloads induced by the software-level distribution and composition mechanisms. To tackle these challenges, we propose object-oriented VR rendering framework (OO-VR) that conducts the software and hardware co-optimization to provide a NUMA friendly solution for VR multi-view rendering in NUMA-based multi-GPU systems. We first propose an object-oriented VR programming model to exploit the data sharing between two views of the same object and group objects into batches based on their texture sharing levels. Then, we design an object aware runtime batch distribution engine and distributed hardware composition unit to achieve the balanced workloads among GPMs. Finally, evaluations on our VR featured simulator show that OO-VR provides 1.58x overall performance improvement and 76% inter-GPM memory traffic reduction over the state-of-the-art multi-GPU systems. In addition, OO-VR provides NUMA friendly performance scalability for the future larger multi-GPU scenarios with ever increasing asymmetric bandwidth between local and remote memory.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源