减少动作空间：基于逆变器的VAR控制的参考模型辅助深钢筋学习

论文标题

减少动作空间：基于逆变器的VAR控制的参考模型辅助深钢筋学习

Reducing Action Space: Reference-Model-Assisted Deep Reinforcement Learning for Inverter-based Volt-Var Control

论文作者

Liu, Qiong, Guo, Ye, Deng, Lirong, Liu, Haotian, Li, Dongyu, Sun, Hongbin

论文摘要

提出了在主动分布网络中基于逆变器的电压控制（IB-VVC）的参考模型辅助深钢筋学习（DRL）。我们调查，大型动作空间增加了DRL的学习困难，并在生成数据和培训神经网络的过程中降低了优化性能。为了减少DRL的动作空间，我们设计了一种参考模型辅助的DRL方法。我们介绍参考模型的定义，基于参考模型的优化和参考操作。参考模型辅助的DRL了解参考动作和最佳动作之间的残差动作，而不是直接学习最佳动作。由于残差动作比参考模型的最佳动作小得多，因此我们可以为参考模型辅助的DRL设计一个较小的动作空间。它减少了DRL的学习困难，并优化了参考模型辅助DRL方法的性能。值得注意的是，参考模型辅助的DRL方法与对于连续动作问题的任何策略梯度DRL算法都兼容。这项工作以软演员批评算法为例，并设计了参考模型辅助的软演员批评算法。模拟表明，1）大型动作空间在整个训练阶段降低了DRL的性能，而2）参考模型辅助DRL需要更少的迭代时间，并且返回更好的优化性能。

Reference-model-assisted deep reinforcement learning (DRL) for inverter-based Volt-Var Control (IB-VVC) in active distribution networks is proposed. We investigate that a large action space increases the learning difficulties of DRL and degrades the optimization performance in the process of generating data and training neural networks. To reduce the action space of DRL, we design a reference-model-assisted DRL approach. We introduce definitions of the reference model, reference-model-based optimization, and reference actions. The reference-model-assisted DRL learns the residual actions between the reference actions and optimal actions, rather than learning the optimal actions directly. Since the residual actions are considerably smaller than the optimal actions for a reference model, we can design a smaller action space for the reference-model-assisted DRL. It reduces the learning difficulties of DRL and optimises the performance of the reference-model-assisted DRL approach. It is noteworthy that the reference-model-assisted DRL approach is compatible with any policy gradient DRL algorithms for continuous action problems. This work takes the soft actor-critic algorithm as an example and designs a reference-model-assisted soft actor-critic algorithm. Simulations show that 1) large action space degrades the performance of DRL in the whole training stage, and 2) reference-model-assisted DRL requires fewer iteration times and returns a better optimization performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题