论文标题
Kalibre:基于知识的神经替代模型校准数据中心数字双胞胎
Kalibre: Knowledge-based Neural Surrogate Model Calibration for Data Center Digital Twins
论文作者
论文摘要
计算流体动力学(CFD)模型已广泛用于原型数据中心。对于大规模数据中心的管理和操作,需要将其发展为高保真{\ em Digital Twin}。手动校准CFD模型参数以通过经过特殊训练的领域专家实现双级忠诚度是乏味和劳动力密集的。为了减少手动努力,为各种计算模型开发的现有自动校准方法将启发式方法应用于经验定义的参数绑定内的搜索模型配置。但是,在CFD的背景下,每个搜索步骤都需要持久的CFD模型的迭代求解,从而使这些方法不切实际,并增加了模型的复杂性。本文介绍了Kalibre,这是一种基于知识的神经替代方法,通过迭代四个关键步骤来执行CFD模型校准,i)培训基于CFD生成的数据的神经替代模型,ii)ii)在瞬间找到最佳参数,通过基于传感器的数据和CFARE模型,通过基于传感器的模型来找到神经替代的方法,并将其验证。传感器测量数据作为地面真相。因此,参数搜索被卸载到与CFD模型的迭代求解相比的神经替代物。为了加快Kalibre的融合,我们将对Twinned Data Center的热物理学的先验知识纳入神经替代设计中,以提高其学习效率。在一个32核处理器上大约五个小时的计算,Kalibre达到了$ 0.81^o $ C的绝对错误(MAE)和$ 0.75^o $ c $ $ $ c校准两个CFD模型,用于两个生产数据厅,每个托管数千台服务器每个服务器都需要较少的CFD解决过程,而CFD求解过程比现有基线接近较少。
Computational fluid dynamics (CFD) model has been widely used for prototyping data centers. Evolving it to high-fidelity {\em digital twin} is desirable for the management and operations of large-scale data centers. Manually calibrating CFD model parameters to achieve twin-class fidelity by specially trained domain expert is tedious and labor-intensive. To reduce manual efforts, existing automatic calibration approaches developed for various computational models apply heuristics to search model configurations within an empirically defined parameter bound. However, in the context of CFD, each search step requires long-lasting CFD model's iterated solving, rendering these approaches impractical with increased model complexity. This paper presents Kalibre, a knowledge-based neural surrogate approach that performs CFD model calibration by iterating four key steps of i) training a neural surrogate model based on CFD-generated data, ii) finding the optimal parameters at the moment through neural surrogate retraining based on sensor-measured data, iii) configuring the found parameters back to the CFD model, and iv) validating the CFD model using sensor-measured data as the ground truth. Thus, the parameter search is offloaded to the neural surrogate which is ultra-faster than CFD model's iterated solving. To speed up the convergence of Kalibre, we integrate prior knowledge of the twinned data center's thermophysics into the neural surrogate design to improve its learning efficiency. With about five hours computation on a 32-core processor, Kalibre achieves mean absolute errors (MAEs) of $0.81^o$C and $0.75^o$C in calibrating two CFD models for two production data halls hosting thousands of servers each while requires fewer CFD solving processes than existing baseline approaches.