论文标题
使用BSC性能工具了解内存访问模式
Understanding Memory Access Patterns Using the BSC Performance Tools
论文作者
论文摘要
随着处理器的发展,处理器和内存速度之间的差距不断增长会导致复杂的内存层次结构,从而通过利用参考位置来减轻这种差异。在这个方向上,最近扩展了BSC性能分析工具,以提供相对于应用程序内存访问的洞察力,描绘了其时间和空间特征,与源代码和所达到的性能同时相关。这些扩展依赖于最近Intel处理器中可用的基于事件的精确采样(PEB)机制来捕获有关应用程序内存访问的信息。随后将采样信息与折叠技术结合使用,以表示内存访问的详细时间演变,并与所达到的性能和源代码对应物结合使用。从这些工具的组合获得的结果不仅有助于应用程序开发人员,还可以帮助处理器架构师更好地了解应用程序的行为以及系统的性能。在本文中,我们描述了将采样机理的更严格整合到监视套件中。我们还通过探索已经优化的状态(即 - 艺术基准)来证明完整工作流程的价值,从而提供了有关其内存访问行为的详细见解。我们利用了这种见解来应用小型修改以改善应用程序的性能。
The growing gap between processor and memory speeds results in complex memory hierarchies as processors evolve to mitigate such divergence by taking advantage of the locality of reference. In this direction, the BSC performance analysis tools have been recently extended to provide insight relative to the application memory accesses depicting their temporal and spatial characteristics, correlating with the source-code and the achieved performance simultaneously. These extensions rely on the Precise Event-Based Sampling (PEBS) mechanism available in recent Intel processors to capture information regarding the application memory accesses. The sampled information is later combined with the Folding technique to represent a detailed temporal evolution of the memory accesses and in conjunction with the achieved performance and the source-code counterpart. The results obtained from the combination of these tools help not only application developers but also processor architects to understand better how the application behaves and how the system performs. In this paper, we describe a tighter integration of the sampling mechanism into the monitoring package. We also demonstrate the value of the complete workflow by exploring already optimized state--of--the--art benchmarks, providing detailed insight of their memory access behavior. We have taken advantage of this insight to apply small modifications that improve the applications' performance.