马尔可夫决策过程中的硬度：理论和实践

论文标题

马尔可夫决策过程中的硬度：理论和实践

Hardness in Markov Decision Processes: Theory and Practice

论文作者

Conserva, Michelangelo, Rauber, Paulo

论文摘要

精心分析在艰难（挑战）环境中加强学习方法的经验优势和缺点对于激发创新和评估该领域的进步至关重要。在表格的增强学习中，没有完善的标准环境选择进行此类分析，这部分是由于对环境硬度丰富的丰富理论缺乏广泛的理解。本文的目的是通过四个主要贡献来释放该理论的实际实用性。首先，我们对硬度理论进行了系统的调查，该调查还确定了有希望的研究方向。其次，我们介绍了罗马斗兽场，这是一个开创性的软件包，可以实现经验硬度分析并实现由不同的硬度衡量标准的环境组成的原则基准。第三，我们提出了经验分析，为可计算措施提供了新的见解。最后，我们在新提出的基准测试中基准了五个表格代理。在提高对非壮大增强学习中硬度的理论理解仍然是必不可少的，但我们在表格环境中的贡献旨在作为朝着原则性非尾巴基准的坚实步骤。因此，我们基准在非壮大的斗兽场环境中基准了四个代理，从而获得了表明表格硬度测量的一般性的结果。

Meticulously analysing the empirical strengths and weaknesses of reinforcement learning methods in hard (challenging) environments is essential to inspire innovations and assess progress in the field. In tabular reinforcement learning, there is no well-established standard selection of environments to conduct such analysis, which is partially due to the lack of a widespread understanding of the rich theory of hardness of environments. The goal of this paper is to unlock the practical usefulness of this theory through four main contributions. First, we present a systematic survey of the theory of hardness, which also identifies promising research directions. Second, we introduce Colosseum, a pioneering package that enables empirical hardness analysis and implements a principled benchmark composed of environments that are diverse with respect to different measures of hardness. Third, we present an empirical analysis that provides new insights into computable measures. Finally, we benchmark five tabular agents in our newly proposed benchmark. While advancing the theoretical understanding of hardness in non-tabular reinforcement learning remains essential, our contributions in the tabular setting are intended as solid steps towards a principled non-tabular benchmark. Accordingly, we benchmark four agents in non-tabular versions of Colosseum environments, obtaining results that demonstrate the generality of tabular hardness measures.

下载PDF全文

下载文献需遵守相关版权规定

论文标题