论文标题
使用MPI-3单面启用高度易于估计的远程内存访问编程
Enabling Highly-Scalable Remote Memory Access Programming with MPI-3 One Sided
论文作者
论文摘要
现代互连提供远程直接内存访问(RDMA)功能。但是,大多数应用程序都依赖于通信的明确消息传递,尽管其不需要的开销。 MPI-3.0标准定义了一个用于直接利用RDMA网络的编程接口,但是,必须在实践中证明它的可扩展性和可实用性。在这项工作中,我们开发了实现MPI-3.0规范的可扩展无浴协议。我们的协议支持扩展到数百万核,具有可忽略的内存消耗,同时提供最高的性能和最少的间接开销。为了武装程序员,我们为所有关键功能提供了各种绩效模型,并通过多项应用程序研究(最多有500万个流程)证明了我们的图书馆和模型的可用性。我们表明,就延迟,带宽和消息速率而言,我们的设计与UPC和Fortran Coarrays相当。我们还证明了具有可比编程复杂性的应用程序性能改进。
Modern interconnects offer remote direct memory access (RDMA) features. Yet, most applications rely on explicit message passing for communications albeit their unwanted overheads. The MPI-3.0 standard defines a programming interface for exploiting RDMA networks directly, however, it's scalability and practicability has to be demonstrated in practice. In this work, we develop scalable bufferless protocols that implement the MPI-3.0 specification. Our protocols support scaling to millions of cores with negligible memory consumption while providing highest performance and minimal overheads. To arm programmers, we provide a spectrum of performance models for all critical functions and demonstrate the usability of our library and models with several application studies with up to half a million processes. We show that our design is comparable to, or better than UPC and Fortran Coarrays in terms of latency, bandwidth, and message rate. We also demonstrate application performance improvements with comparable programming complexity.