强化学习基准测试的可靠验证

论文标题

强化学习基准测试的可靠验证

Reliable validation of Reinforcement Learning Benchmarks

论文作者

Müller-Brockhausen, Matthias, Plaat, Aske, Preuss, Mike

论文摘要

强化学习（RL）是AI和AI游戏中最具动态的研究领域之一，并且将各种各样的游戏用作其突出的测试问题。但是，它会受到当前影响大多数算法AI研究的可复制性危机。通过可验证的结果，可以改善增强学习中的基准测试。有许多基准环境，其得分用于比较不同的算法，例如Atari。然而，审稿人必须相信数字代表真实的价值观，因为很难重现精确的训练曲线。我们通过提供对原始实验数据的访问来验证研究结果来改善这种情况。为此，我们依靠最小痕迹的概念。这些允许在确定性的RL环境中重新模拟动作序列，然后使审阅者能够验证，重复使用和手动检查实验结果，而无需大量的计算簇。它还允许验证提出的奖励图，对单个发作的检查以及对结果数据（基准）的重复使用，以在后续论文中进行适当的比较。我们提供与健身房合作的插件代码，使我们的测量非常适合现有的RL和可重复性生态系统。我们的方法是免费的，易于使用的，并且添加了最小的开销，因为与在离线RL RL DataSet中使用的常规MDP跟踪相比，最小的数据压缩率最高可达$ \ $ \ $ \ $ \ $ \ 10^4：1 $（ATARI PONG的94GB至8MB）。本文为各种游戏提供了概念验证结果。

Reinforcement Learning (RL) is one of the most dynamic research areas in Game AI and AI as a whole, and a wide variety of games are used as its prominent test problems. However, it is subject to the replicability crisis that currently affects most algorithmic AI research. Benchmarking in Reinforcement Learning could be improved through verifiable results. There are numerous benchmark environments whose scores are used to compare different algorithms, such as Atari. Nevertheless, reviewers must trust that figures represent truthful values, as it is difficult to reproduce an exact training curve. We propose improving this situation by providing access to the original experimental data to validate study results. To that end, we rely on the concept of minimal traces. These allow re-simulation of action sequences in deterministic RL environments and, in turn, enable reviewers to verify, re-use, and manually inspect experimental results without needing large compute clusters. It also permits validation of presented reward graphs, an inspection of individual episodes, and re-use of result data (baselines) for proper comparison in follow-up papers. We offer plug-and-play code that works with Gym so that our measures fit well in the existing RL and reproducibility eco-system. Our approach is freely available, easy to use, and adds minimal overhead, as minimal traces allow a data compression ratio of up to $\approx 10^4:1$ (94GB to 8MB for Atari Pong) compared to a regular MDP trace used in offline RL datasets. The paper presents proof-of-concept results for a variety of games.

下载PDF全文

下载文献需遵守相关版权规定

论文标题