论文标题
通过Mondrian森林进行数据驱动的湍流建模的不确定性定量
Uncertainty Quantification for Data-driven Turbulence Modelling with Mondrian Forests
论文作者
论文摘要
数据驱动的湍流建模方法正在从CFD社区中越来越多地兴趣。但是,引入机器学习(ML)模型引入了新的不确定性来源,即ML模型本身。这种不确定性的量化至关重要,因为数据驱动模型的预测能力在预测训练过程中未见物理时会降低。在这项工作中,我们探讨了蒙德里亚森林(MF)对数据驱动的湍流建模的适用性。据称,MF具有常用随机森林(RF)机器学习算法的许多优势,同时提供了原则上的不确定性估计。构建了一个示例测试用例,其湍流各向异性常数来自高保真湍流解析模拟。从游戏理论借来的Shapley价值观用于解释MF预测。在训练数据没有代表性的地区,发现预测不确定性很大。此外,与先验统计距离度量相比,发现MF预测性不确定性与预测错误表现出更强的相关性,这表明它是更好地衡量预测置信度的量度。还发现MF预测性不确定性是更好的校准,并且计算成本较小,而估计的不确定性是将夹克式置于随机森林预测中所估计的不确定性。最后,蒙德里安森林用于预测收敛性通道中的雷诺差异,随后通过修改的CFD求解器传播,该通道随后传播。最终的流场预测与高保真数据密切一致。引入了对蒙德里亚森林的不确定性进行采样的程序。传播这些样品可以量化感兴趣的输出量的不确定性。
Data-driven turbulence modelling approaches are gaining increasing interest from the CFD community. However, the introduction of a machine learning (ML) model introduces a new source of uncertainty, the ML model itself. Quantification of this uncertainty is essential since the predictive capability of a data-driven model diminishes when predicting physics not seen during training. In this work, we explore the suitability of Mondrian forests (MF's) for data-driven turbulence modelling. MF's are claimed to possess many of the advantages of the commonly used random forest (RF) machine learning algorithm, whilst offering principled uncertainty estimates. An example test case is constructed, with a turbulence anisotropy constant derived from high fidelity turbulence resolving simulations. Shapley values, borrowed from game theory, are used to interpret the MF predictions. Predictive uncertainty is found to be large in regions where the training data is not representative. Additionally, the MF predictive uncertainty is found to exhibit stronger correlation with predictive errors compared to an a priori statistical distance measure, which indicates it is a better measure of prediction confidence. The MF predictive uncertainty is also found to be better calibrated and less computationally costly than the uncertainty estimated from applying jackknifing to random forest predictions. Finally, Mondrian forests are used to predict the Reynolds discrepancies in a convergent-divergent channel, which are subsequently propagated through a modified CFD solver. The resulting flowfield predictions are in close agreement with the high fidelity data. A procedure for sampling the Mondrian forests' uncertainties is introduced. Propagating these samples enables quantification of the uncertainty in output quantities of interest.