论文标题

动态抽样率:省GPU的图形应用程序中的框架连贯性

Dynamic Sampling Rate: Harnessing Frame Coherence in Graphics Applications for Energy-Efficient GPUs

论文作者

Anglada, Martí, de Lucas, Enrique, Parcerisa, Joan-Manuel, Aragón, Juan L., González, Antonio

论文摘要

在实时渲染中,一个3D场景以GPU投影到屏幕的三角形网格建模。通过定期空间间隔对每个三角形进行采样以生成片段,然后通过着色器程序添加纹理和照明效果来使它们离散。现实的场景需要详细的几何模型,复杂的着色器,高分辨率显示器和高屏幕令人耳目一新的速度,所有这些都以巨大的计算时间和能源成本出现。这种成本通常由碎片着色器主导,该片段着色的片段为每个采样片段运行。传统的GPU样品每个像素一次三角形一次,但是,有许多包含低变化的屏幕区域会产生相同的片段,并且可以在低于像素速率的低质量下以低于像素速率进行采样。此外,由于时间框架相干性使连续帧非常相似,因此通常会从框架到框架保持这种变化。这项工作提出了动态采样率(DSR),这是一种新型的硬件机制,可降低冗余并提高图形应用中的能源效率。 DSR一旦呈现场景就会分析场景的空间频率。然后,它利用连续帧的时间连贯性来决定屏幕的每个区域,是下一帧中维持图像质量的最低采样率。我们评估了与DSR扩展的最先进的移动GPU架构的性能,用于多种应用。实验结果表明,DSR能够消除碎片粒度下颜色计算中固有的大多数冗余,这使得平均速度为1.68倍,能源节省40%。

In real-time rendering, a 3D scene is modelled with meshes of triangles that the GPU projects to the screen. They are discretized by sampling each triangle at regular space intervals to generate fragments which are then added texture and lighting effects by a shader program. Realistic scenes require detailed geometric models, complex shaders, high-resolution displays and high screen refreshing rates, which all come at a great compute time and energy cost. This cost is often dominated by the fragment shader, which runs for each sampled fragment. Conventional GPUs sample the triangles once per pixel, however, there are many screen regions containing low variation that produce identical fragments and could be sampled at lower than pixel-rate with no loss in quality. Additionally, as temporal frame coherence makes consecutive frames very similar, such variations are usually maintained from frame to frame. This work proposes Dynamic Sampling Rate (DSR), a novel hardware mechanism to reduce redundancy and improve the energy efficiency in graphics applications. DSR analyzes the spatial frequencies of the scene once it has been rendered. Then, it leverages the temporal coherence in consecutive frames to decide, for each region of the screen, the lowest sampling rate to employ in the next frame that maintains image quality. We evaluate the performance of a state-of-the-art mobile GPU architecture extended with DSR for a wide variety of applications. Experimental results show that DSR is able to remove most of the redundancy inherent in the color computations at fragment granularity, which brings average speedups of 1.68x and energy savings of 40%.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源