论文标题
多资产闭环水库使用深入的加固学习
Multi-Asset Closed-Loop Reservoir Management Using Deep Reinforcement Learning
论文作者
论文摘要
闭环储层管理(CLRM)在资产的寿命中多次进行历史匹配和生产优化,可以为指定目标提供显着改善。由于数据同化和优化所需的大量流量模拟,这些过程在计算上昂贵。现有的CLRM程序是通过资产应用的,而无需利用可能在范围资产上有用的信息。在这里,我们开发了一个CLRM框架,用于多个井数的多个资产。我们使用深入的加强学习来培训适用于考虑所有资产的单一全球控制政策。新框架是最近引入的单个资产控制政策方法的扩展。将嵌入层纳入表示形式,以处理针对不同资产出现的不同数量的决策变量。由于全球控制策略从多个资产中学习了有用功能的统一表示,因此构建的统一功能比逐项资产培训要便宜(我们在示例中观察到大约3倍加速)。生产优化问题包括对井设置的相对变化约束,这使得适合实际使用的结果。我们将多资产的CLRM框架应用于2D和3D浸水的例子。在这两种情况下,都考虑了四个具有不同井计数,井配置和地列为描述的资产。数值实验表明,全球控制策略为2D和3D案例提供了客观函数值,这些策略与每个资产分别训练的控制策略的案例几乎相同。这一有希望的发现表明,多资产的CLRM确实可能代表了可行的实践策略。
Closed-loop reservoir management (CLRM), in which history matching and production optimization are performed multiple times over the life of an asset, can provide significant improvement in the specified objective. These procedures are computationally expensive due to the large number of flow simulations required for data assimilation and optimization. Existing CLRM procedures are applied asset by asset, without utilizing information that could be useful over a range assets. Here, we develop a CLRM framework for multiple assets with varying numbers of wells. We use deep reinforcement learning to train a single global control policy that is applicable for all assets considered. The new framework is an extension of a recently introduced control policy methodology for individual assets. Embedding layers are incorporated into the representation to handle the different numbers of decision variables that arise for the different assets. Because the global control policy learns a unified representation of useful features from multiple assets, it is less expensive to construct than asset-by-asset training (we observe about 3x speedup in our examples). The production optimization problem includes a relative-change constraint on the well settings, which renders the results suitable for practical use. We apply the multi-asset CLRM framework to 2D and 3D water-flooding examples. In both cases, four assets with different well counts, well configurations, and geostatistical descriptions are considered. Numerical experiments demonstrate that the global control policy provides objective function values, for both the 2D and 3D cases, that are nearly identical to those from control policies trained individually for each asset. This promising finding suggests that multi-asset CLRM may indeed represent a viable practical strategy.