论文标题
ggarray:动态生长的GPU阵列
GGArray: A Dynamically Growable GPU Array
论文作者
论文摘要
我们提出了在GPU中完全实现的动态生长的GPU阵列(GGARRAY),该数组不需要与主机同步。这个想法是通过提供不需要在最坏情况下不需要预先分配GPU VRAM的结构来改善需要动态内存的GPU应用程序的编程。 GGARRAY基于LFVECTOR,利用它们的数组来利用GPU体系结构和线程块提供的同步。将这种结构与其他最先进的结构进行比较,例如预先分配的静态阵列和需要通过与主机进行通信来调整大小的半静态阵列。实验评估表明,Ggarray具有竞争性插入并调整性能大小,但对于常规的并行内存访问速度较慢。鉴于结果,GGARRAY是一种潜在的有用结构,用于对内存使用情况高不确定性以及具有阶段的应用,例如插入阶段,然后是常规GPU阶段。在这种情况下,GGARRAY可用于第一阶段,然后可以将数据用于第二阶段,以使经典的GPU内存访问更快。这些结果构成了实现现代GPU体系结构的平行有效C ++的一步。
We present a dynamically Growable GPU array (GGArray) fully implemented in GPU that does not require synchronization with the host. The idea is to improve the programming of GPU applications that require dynamic memory, by offering a structure that does not require pre-allocating GPU VRAM for the worst case scenario. The GGArray is based on the LFVector, by utilizing an array of them in order to take advantage of the GPU architecture and the synchronization offered by thread blocks. This structure is compared to other state of the art ones such as a pre-allocated static array and a semi-static array that needs to be resized through communication with the host. Experimental evaluation shows that the GGArray has a competitive insertion and resize performance, but it is slower for regular parallel memory accesses. Given the results, the GGArray is a potentially useful structure for applications with high uncertainty on the memory usage as well as applications that have phases, such as an insertion phase followed by a regular GPU phase. In such cases, the GGArray can be used for the first phase and then data can be flattened for the second phase in order to allow the classical GPU memory accesses which are faster. These results constitute a step towards achieving a parallel efficient C++ like vector for modern GPU architectures.