单步深钢筋学习，用于层流和动荡流的开环控制

论文标题

单步深钢筋学习，用于层流和动荡流的开环控制

Single-step deep reinforcement learning for open-loop control of laminar and turbulent flows

论文作者

Ghraieb, H., Viquerat, J., Larcher, A., Meliga, P., Hachem, E.

论文摘要

这项研究测量了深度加固学习（DRL）技术的能力，以帮助流体机械系统的优化和控制。它结合了近端策略优化（PPO）算法的小说“退化”版本，该算法训练神经网络，以优化每次学习剧集一次系统，并在内部稳定的有限元元素环境中实现了变异多尺度（VMS）方法，该方法计算为Newerical奖励提供给Neural网络的数字奖励。三个维度分离流的三个原型示例用作开发方法的测试床，每种方法都会增加由于流量溶液的不稳定或目标函数的清晰度或控制参数空间的尺寸，因此增加了一层复杂性。通过系统地比较通过规范直接和伴随方法获得的参考证据来仔细评估相关性。除了增加有关该主题的浅文献价值外，这些发现还为单步PPO提供了可靠的黑盒优化计算流体动力学（CFD）系统的潜力，这为使用此新方法的最佳流量控制铺平了道路。

This research gauges the ability of deep reinforcement learning (DRL) techniques to assist the optimization and control of fluid mechanical systems. It combines a novel, "degenerate" version of the proximal policy optimization (PPO) algorithm, that trains a neural network in optimizing the system only once per learning episode, and an in-house stabilized finite elements environment implementing the variational multiscale (VMS) method, that computes the numerical reward fed to the neural network. Three prototypical examples of separated flows in two dimensions are used as testbed for developing the methodology, each of which adds a layer of complexity due either to the unsteadiness of the flow solutions, or the sharpness of the objective function, or the dimension of the control parameter space. Relevance is carefully assessed by comparing systematically to reference data obtained by canonical direct and adjoint methods. Beyond adding value to the shallow literature on this subject, these findings establish the potential of single-step PPO for reliable black-box optimization of computational fluid dynamics (CFD) systems, which paves the way for future progress in optimal flow control using this new class of methods.

下载PDF全文

下载文献需遵守相关版权规定

论文标题