FPGA上加速托管语言的透明编译器和运行时专业

论文标题

FPGA上加速托管语言的透明编译器和运行时专业

Transparent Compiler and Runtime Specializations for Accelerating Managed Languages on FPGAs

论文作者

Papadimitriou, Michail, Fumero, Juan, Stratikopoulos, Athanasios, Zakkak, Foivos S., Kotselidis, Christos

论文摘要

近年来，异质计算已成为增加计算机的重要方法？通过组合多样化的硬件设备，例如图形处理单元（GPU）和现场可编程门阵列（FPGAS），性能和能源效率。这种趋势背后的理由是，可以将应用程序的不同部分从主要CPU卸载到各种设备，这些设备可以有效地作为协同处理器执行这些零件。 FPGA是最广泛使用的处理器的子集，由于其灵活的硬件和节能特性，通常用于加速特定工作负载。这些特征使它们在从低功率嵌入式系统到高端数据中心和云基础架构的广泛计算系统中普遍存在。但是，这些硬件特性是以可编程性为代价的。需要使用高级编程语言（例如Java，Python等）创建应用程序的开发人员才能熟悉硬件说明语言（例如VHDL，Verilog）或最近异质的编程模型（例如Opencl，HLS，HLS），以利用该协同服务员？容量并调整其应用程序的性能。当前，上述异构编程模型仅支持Comping语言（例如C和C ++）的编译。因此，将异质的处理器透明地集成到托管编程语言的软件生态系统（例如Java，Python）并非无缝。在本文中，我们重新考虑了我们遇到的工程权衡，从透明性和汇编开销方面，同时将FPGA集成到高级托管的编程语言中。我们提出了一种新颖的方法，该方法可以实现运行时代码专业化技术，以实现FPGA上Java程序的无缝和高性能执行。提出的解决方案是在Java编程语言和Tornadovm的背景下进行的。用于异质硬件的Java执行的开源编程框架。最后，我们评估了针对顺序和多线Java实现的FPGA执行的解决方案，分别显示高达224倍和19.8倍的性能加速度，并且与在英特尔集成的GPU上运行的龙卷管相比，最高可达13.82倍。我们还提供了针对FPGA执行的提议的编译器优化的分析，以表明其对应用程序的影响？特征。

In recent years, heterogeneous computing has emerged as the vital way to increase computers? performance and energy efficiency by combining diverse hardware devices, such as Graphics Processing Units (GPUs) and Field Programmable Gate Arrays (FPGAs). The rationale behind this trend is that different parts of an application can be offloaded from the main CPU to diverse devices, which can efficiently execute these parts as co-processors. FPGAs are a subset of the most widely used co-processors, typically used for accelerating specific workloads due to their flexible hardware and energy-efficient characteristics. These characteristics have made them prevalent in a broad spectrum of computing systems ranging from low-power embedded systems to high-end data centers and cloud infrastructures. However, these hardware characteristics come at the cost of programmability. Developers who create their applications using high-level programming languages (e.g., Java, Python, etc.) are required to familiarize with a hardware description language (e.g., VHDL, Verilog) or recently heterogeneous programming models (e.g., OpenCL, HLS) in order to exploit the co-processors? capacity and tune the performance of their applications. Currently, the above-mentioned heterogeneous programming models support exclusively the compilation from compiled languages, such as C and C++. Thus, the transparent integration of heterogeneous co-processors to the software ecosystem of managed programming languages (e.g. Java, Python) is not seamless. In this paper we rethink the engineering trade-offs that we encountered, in terms of transparency and compilation overheads, while integrating FPGAs into high-level managed programming languages. We present a novel approach that enables runtime code specialization techniques for seamless and high-performance execution of Java programs on FPGAs. The proposed solution is prototyped in the context of the Java programming language and TornadoVM; an open-source programming framework for Java execution on heterogeneous hardware. Finally, we evaluate the proposed solution for FPGA execution against both sequential and multi-threaded Java implementations showcasing up to 224x and 19.8x performance speedups, respectively, and up to 13.82x compared to TornadoVM running on an Intel integrated GPU. We also provide a break-down analysis of the proposed compiler optimizations for FPGA execution, as a means to project their impact on the applications? characteristics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题