通过扩散模型探索基于离线模型的优化的验证指标

论文标题

通过扩散模型探索基于离线模型的优化的验证指标

Exploring validation metrics for offline model-based optimisation with diffusion models

论文作者

Beckham, Christopher, Piche, Alexandre, Vazquez, David, Pal, Christopher

论文摘要

在基于模型的优化（MBO）中，我们有兴趣使用机器学习来设计候选者，这些候选者相对于称为（地面真相）Oracle的黑匣子功能，最大程度地提高了一定的奖励，因为它涉及执行现实世界过程，这很昂贵。在离线MBO中，我们希望在培训或验证期间没有假设使用这种甲骨文的情况下进行评估。虽然在模型验证期间，可以对地面甲骨文的近似值进行训练和使用，以衡量对生成的候选者的平均奖励，但评估是近似且容易受到对抗性示例的影响。在此近似值上衡量产生的候选人的平均奖励就是这样的“验证度量”，而我们对一个更基本的问题感兴趣，该问题正在发现哪种验证指标与地面真相最相关。这涉及提出验证指标，并在许多数据集上量化它们，例如模拟环境。这是在我们提出的评估框架下封装的，该框架也旨在衡量外推，这是利用MBO生成模型的最终目标。尽管我们的评估框架是模型不可知论，但由于其最先进的性能，我们专门评估了DeNo的扩散模型，并获得了有趣的见解，例如对最有效的验证指标进行排名，并讨论重要的超级标准。

In model-based optimisation (MBO) we are interested in using machine learning to design candidates that maximise some measure of reward with respect to a black box function called the (ground truth) oracle, which is expensive to compute since it involves executing a real world process. In offline MBO we wish to do so without assuming access to such an oracle during training or validation, with makes evaluation non-straightforward. While an approximation to the ground oracle can be trained and used in place of it during model validation to measure the mean reward over generated candidates, the evaluation is approximate and vulnerable to adversarial examples. Measuring the mean reward of generated candidates over this approximation is one such `validation metric', whereas we are interested in a more fundamental question which is finding which validation metrics correlate the most with the ground truth. This involves proposing validation metrics and quantifying them over many datasets for which the ground truth is known, for instance simulated environments. This is encapsulated under our proposed evaluation framework which is also designed to measure extrapolation, which is the ultimate goal behind leveraging generative models for MBO. While our evaluation framework is model agnostic we specifically evaluate denoising diffusion models due to their state-of-the-art performance, as well as derive interesting insights such as ranking the most effective validation metrics as well as discussing important hyperparameters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题