均值变化有效的增强学习，并应用于动态金融投资

论文标题

均值变化有效的增强学习，并应用于动态金融投资

Mean-Variance Efficient Reinforcement Learning with Applications to Dynamic Financial Investment

论文作者

Kato, Masahiro, Nakagawa, Kei, Abe, Kenshi, Morimura, Tetsuro, Baba, Kentaro

论文摘要

这项研究调查了增强学习（RL）的平均变化（MV）权衡权衡，这是不确定性下的顺序决策实例。我们的目标是获得MV效率的政策，其手段和差异位于帕累托有效的边界，相对于MV权衡；在这种情况下，预期奖励的任何增加都将需要相应的差异增加，反之亦然。为此，我们提出了一种培训我们的政策以最大化预期二次实用程序的方法，该方法定义为通过我们的政策获得的奖励的第一和第二时刻的加权总和。随后，我们证明了最大化器确实有资格成为MV有效的政策。先前采用受限优化来解决MV权衡的研究遇到了计算挑战。但是，我们的方法在计算上更有效，因为它消除了对方差的梯度估计的需求，这是对现有方法中观察到的双重抽样问题的促成因素。通过实验，我们验证了方法的功效。

This study investigates the mean-variance (MV) trade-off in reinforcement learning (RL), an instance of the sequential decision-making under uncertainty. Our objective is to obtain MV-efficient policies whose means and variances are located on the Pareto efficient frontier with respect to the MV trade-off; under the condition, any increase in the expected reward would necessitate a corresponding increase in variance, and vice versa. To this end, we propose a method that trains our policy to maximize the expected quadratic utility, defined as a weighted sum of the first and second moments of the rewards obtained through our policy. We subsequently demonstrate that the maximizer indeed qualifies as an MV-efficient policy. Previous studies that employed constrained optimization to address the MV trade-off have encountered computational challenges. However, our approach is more computationally efficient as it eliminates the need for gradient estimation of variance, a contributing factor to the double sampling issue observed in existing methodologies. Through experimentation, we validate the efficacy of our approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题