有效评估Java软件中的稳态绩效：我们到了吗？

论文标题

有效评估Java软件中的稳态绩效：我们到了吗？

Towards effective assessment of steady state performance in Java software: Are we there yet?

论文作者

Traini, Luca, Cortellessa, Vittorio, Di Pompeo, Daniele, Tucci, Michele

论文摘要

Microbenchmarking是Java软件中广泛使用的性能测试形式。 Microbenchmark在收集与其性能相关的测量时反复执行一小部分代码。由于Java虚拟机的优化，在执行的第一阶段（也称为热身），微基准通常会受到严重的性能波动。因此，软件开发人员通常会丢弃此阶段的测量值，并在基准达到稳定的性能状态时将分析集中在分析中。开发人员根据其专业知识估算热身阶段的终结，并相应地配置其基准。不幸的是，这种方法基于两个强大的假设：（i）基准始终达到稳定的性能状态，并且（ii）开发人员可以准确估计热身。在本文中，我们表明Java Microbenchs并不总是达到稳定状态，并且通常无法准确估计热身阶段的结束。我们发现，大量研究的基准测试不会达到稳态，并且软件开发人员提供的热身估计通常不准确（错误）。这在质量质量和时间效果方面都具有重大影响。此外，我们发现动态重新配置显着提高了热身估计的精度，但仍会引起次优的热身估计和相关的副作用。我们设想本文是支持引入更复杂的自动化技术的起点，该技术可以及时确保结果质量。

Microbenchmarking is a widely used form of performance testing in Java software. A microbenchmark repeatedly executes a small chunk of code while collecting measurements related to its performance. Due to Java Virtual Machine optimizations, microbenchmarks are usually subject to severe performance fluctuations in the first phase of their execution (also known as warmup). For this reason, software developers typically discard measurements of this phase and focus their analysis when benchmarks reach a steady state of performance. Developers estimate the end of the warmup phase based on their expertise, and configure their benchmarks accordingly. Unfortunately, this approach is based on two strong assumptions: (i) benchmarks always reach a steady state of performance and (ii) developers accurately estimate warmup. In this paper, we show that Java microbenchmarks do not always reach a steady state, and often developers fail to accurately estimate the end of the warmup phase. We found that a considerable portion of studied benchmarks do not hit the steady state, and warmup estimates provided by software developers are often inaccurate (with a large error). This has significant implications both in terms of results quality and time-effort. Furthermore, we found that dynamic reconfiguration significantly improves warmup estimation accuracy, but still it induces suboptimal warmup estimates and relevant side-effects. We envision this paper as a starting point for supporting the introduction of more sophisticated automated techniques that can ensure results quality in a timely fashion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题