通过直接启发式动态编程通过在线强化学习控制：从时间驱动到事件驱动

论文标题

通过直接启发式动态编程通过在线强化学习控制：从时间驱动到事件驱动

Online Reinforcement Learning Control by Direct Heuristic Dynamic Programming: from Time-Driven to Event-Driven

论文作者

Zhao, Qingtao, Si, Jennie, Sun, Jian

论文摘要

在本文中，时间驱动的学习是指机器学习方法，该方法在新数据到达时连续更新预测模型中的参数。在现有的近似动态编程（ADP）和强化学习（RL）算法中，直接启发式动态编程（DHDP）已显示出一种有效的工具，可以在解决几个复杂的学习控制问题方面证明。随着系统状态不断发展，它不断更新控制策略和评论家。因此，由于系统事件（例如噪声）而导致的时间驱动的DHDP更新是可取的。为了实现这一目标，我们提出了一个新的事件驱动的DHDP。通过构建Lyapunov函数候选者，我们证明了系统状态的最终有限（UUB）以及评论家和控制策略网络中的权重。因此，我们显示了有限界限内接近贝尔曼最优性的近似控制和成本符合的功能。我们还说明了与原始时间驱动的DHDP相比，事件驱动的DHDP算法的工作原理。

In this paper time-driven learning refers to the machine learning method that updates parameters in a prediction model continuously as new data arrives. Among existing approximate dynamic programming (ADP) and reinforcement learning (RL) algorithms, the direct heuristic dynamic programming (dHDP) has been shown an effective tool as demonstrated in solving several complex learning control problems. It continuously updates the control policy and the critic as system states continuously evolve. It is therefore desirable to prevent the time-driven dHDP from updating due to insignificant system event such as noise. Toward this goal, we propose a new event-driven dHDP. By constructing a Lyapunov function candidate, we prove the uniformly ultimately boundedness (UUB) of the system states and the weights in the critic and the control policy networks. Consequently we show the approximate control and cost-to-go function approaching Bellman optimality within a finite bound. We also illustrate how the event-driven dHDP algorithm works in comparison to the original time-driven dHDP.

下载PDF全文

下载文献需遵守相关版权规定

论文标题