Pim-Tree：用于内存处理的偏斜索引

论文标题

Pim-Tree：用于内存处理的偏斜索引

PIM-tree: A Skew-resistant Index for Processing-in-Memory

论文作者

Kang, Hongbo, Zhao, Yiwei, Blelloch, Guy E., Dhulipala, Laxman, Gu, Yan, McGuffey, Charles, Gibbons, Phillip B.

论文摘要

当今的内存索引的性能是由内存延迟/带宽墙瓶颈瓶颈。记忆中的处理（PIM）是一种新兴方法，它通过启用低延迟内存访问的范围内存储器带宽量表的低延迟内存访问，可以减轻这种瓶颈。但是，在工作负载偏斜的情况下，在最小化节点间通信与实现PIM系统中的负载平衡之间存在固有的张力。本文介绍了PIM-Tree，这是PIM系统的有序索引，无论数据和查询中的偏差程度如何，都可以达到低通信和高负载平衡。我们的抗偏压指数是基于多核主机CPU和PIM节点之间的新型劳动分裂的，该分支利用了每个节点的优势。我们介绍了推送搜索，该搜索是动态决定是将查询推入PIM-TREE节点（CPU-> pim节点），还是将节点的键拉回基于工作负载偏斜的CPU（PIM节点 - > CPU）。结合其他对PIM友好的优化（阴影子树和块状跳过列表），我们的Pim-Tree提供了高通量，（保证的）低沟通和（保证）高负载余额，用于批次查询，更新，更新和范围扫描。除了先前提出的PIM索引外，我们在UPMEM的最新PIM系统上实施了PIM-Tree结构，其中32个CPU核心和2048 PIM节点。在具有5亿键和100万查询的批次的工作负载上，使用PIM-Trees的吞吐量高达69.7倍，比两种最佳先前方法高59.1倍。据我们所知，这些是对真实PIM系统上有序索引的第一个实现。

The performance of today's in-memory indexes is bottlenecked by the memory latency/bandwidth wall. Processing-in-memory (PIM) is an emerging approach that potentially mitigates this bottleneck, by enabling low-latency memory access whose aggregate memory bandwidth scales with the number of PIM nodes. There is an inherent tension, however, between minimizing inter-node communication and achieving load balance in PIM systems, in the presence of workload skew. This paper presents PIM-tree, an ordered index for PIM systems that achieves both low communication and high load balance, regardless of the degree of skew in the data and the queries. Our skew-resistant index is based on a novel division of labor between the multi-core host CPU and the PIM nodes, which leverages the strengths of each. We introduce push-pull search, which dynamically decides whether to push queries to a PIM-tree node (CPU -> PIM-node) or pull the node's keys back to the CPU (PIM-node -> CPU) based on workload skew. Combined with other PIM-friendly optimizations (shadow subtrees and chunked skip lists), our PIM-tree provides high-throughput, (guaranteed) low communication, and (guaranteed) high load balance, for batches of point queries, updates, and range scans. We implement the PIM-tree structure, in addition to prior proposed PIM indexes, on the latest PIM system from UPMEM, with 32 CPU cores and 2048 PIM nodes. On workloads with 500 million keys and batches of one million queries, the throughput using PIM-trees is up to 69.7x and 59.1x higher than the two best prior methods. As far as we know these are the first implementations of an ordered index on a real PIM system.

下载PDF全文

下载文献需遵守相关版权规定

论文标题