论文标题
GraphScale:FPGA上的可扩展带宽效率图处理
GraphScale: Scalable Bandwidth-Efficient Graph Processing on FPGAs
论文作者
论文摘要
FPGA的图形处理的最新进展有望减轻具有不规则内存访问模式的性能瓶颈。 Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics.尽管FPGA通过灵活的内存层次结构和大规模的并行性表示有希望的解决方案,但我们认为当前的图形处理加速器要么使用芯片外存储器带宽效率低下,要么在内存通道之间进行缩放不佳。 在这项工作中,我们提出了GraphScale,这是FPGA的可扩展图形处理框架。 GraphScale首次将多通道存储器与异步图处理(即,用于在结果上快速收敛)和压缩图表示(即,为了有效使用内存带宽和减少内存足迹)。 GraphScale通过模块化的用户定义功能,一种新颖的二维分区方案以及高性能的两级横杆设计,解决了诸如广度优先搜索,Pagerank和弱连接组件的常见图形问题。
Recent advances in graph processing on FPGAs promise to alleviate performance bottlenecks with irregular memory access patterns. Such bottlenecks challenge performance for a growing number of important application areas like machine learning and data analytics. While FPGAs denote a promising solution through flexible memory hierarchies and massive parallelism, we argue that current graph processing accelerators either use the off-chip memory bandwidth inefficiently or do not scale well across memory channels. In this work, we propose GraphScale, a scalable graph processing framework for FPGAs. For the first time, GraphScale combines multi-channel memory with asynchronous graph processing (i.e., for fast convergence on results) and a compressed graph representation (i.e., for efficient usage of memory bandwidth and reduced memory footprint). GraphScale solves common graph problems like breadth-first search, PageRank, and weakly-connected components through modular user-defined functions, a novel two-dimensional partitioning scheme, and a high-performance two-level crossbar design.