完全关联缓存的快速分析模型

论文标题

完全关联缓存的快速分析模型

A Fast Analytical Model of Fully Associative Caches

论文作者

Gysi, Tobias, Grosser, Tobias, Brandner, Laurin, Hoefler, Torsten

论文摘要

虽然计算成本易于理解的本地属性，但在缓存体系结构上的数据流动的成本取决于全球状态，并不组成，并且很难预测。结果，程序员通常无法考虑数据移动的成本。现有的缓存模型和模拟器提供了丢失的信息，但计算上很昂贵。我们提出了一个轻巧的缓存模型，适用于完全使用的完全关联缓存（LRU）替换策略，可提供快速准确的结果。我们通过使用符号计数技术两次对所有内存访问的明确枚举进行计算，而没有明确枚举所有内存访问：1）为每个内存访问而得出堆栈距离和2）2）计算堆栈距离的内存访问，其堆栈距离大于缓存大小。尽管这项技术在理论上似乎是不可行的，但由于第一轮计数后的非线性，我们表明计数问题在实践中足够线性。我们的缓存模型通常在几秒钟内计算结果，并且与仿真相反，执行时间主要独立于问题大小。我们的评估措施对实际硬件的错误建模低于0.6％。通过提供准确的数据放置信息，我们启用内存层次结构的软件开发。

While the cost of computation is an easy to understand local property, the cost of data movement on cached architectures depends on global state, does not compose, and is hard to predict. As a result, programmers often fail to consider the cost of data movement. Existing cache models and simulators provide the missing information but are computationally expensive. We present a lightweight cache model for fully associative caches with least recently used (LRU) replacement policy that gives fast and accurate results. We count the cache misses without explicit enumeration of all memory accesses by using symbolic counting techniques twice: 1) to derive the stack distance for each memory access and 2) to count the memory accesses with stack distance larger than the cache size. While this technique seems infeasible in theory, due to non-linearities after the first round of counting, we show that the counting problems are sufficiently linear in practice. Our cache model often computes the results within seconds and contrary to simulation the execution time is mostly problem size independent. Our evaluation measures modeling errors below 0.6% on real hardware. By providing accurate data placement information we enable memory hierarchy aware software development.

下载PDF全文

下载文献需遵守相关版权规定

论文标题