通过EM算法缓解基于风险的主动学习中的采样偏差

论文标题

通过EM算法缓解基于风险的主动学习中的采样偏差

Mitigating sampling bias in risk-based active learning via an EM algorithm

论文作者

Hughes, Aidan J., Bull, Lawrence A., Gardner, Paul, Dervilis, Nikolaos, Worden, Keith

论文摘要

基于风险的积极学习是开发用于在线决策支持的统计分类器的一种方法。在这种方法中，根据初始数据点的完美信息的预期值来指导数据标签查询。对于SHM应用程序，根据维护决策过程评估信息的价值，并且数据标签查询对应于检查结构以确定其健康状态的检查。采样偏见是主动学习范式中的一个已知问题；当一个主动学习过程过多或示例的特定区域的特定区域时，就会发生这种情况，从而导致训练集不代表基础分布。这种偏见最终降低了决策绩效，因此导致了不必要的成本。当前的论文概述了一种基于风险的主动学习方法，该方法利用了半监督的高斯混合模型。半监督的方法通过通过EM算法合并了未标记的数据来抵消采样偏差。该方法在SHM中发现的决策过程的数值示例中得到了证明。

Risk-based active learning is an approach to developing statistical classifiers for online decision-support. In this approach, data-label querying is guided according to the expected value of perfect information for incipient data points. For SHM applications, the value of information is evaluated with respect to a maintenance decision process, and the data-label querying corresponds to the inspection of a structure to determine its health state. Sampling bias is a known issue within active-learning paradigms; this occurs when an active learning process over- or undersamples specific regions of a feature-space, thereby resulting in a training set that is not representative of the underlying distribution. This bias ultimately degrades decision-making performance, and as a consequence, results in unnecessary costs incurred. The current paper outlines a risk-based approach to active learning that utilises a semi-supervised Gaussian mixture model. The semi-supervised approach counteracts sampling bias by incorporating pseudo-labels for unlabelled data via an EM algorithm. The approach is demonstrated on a numerical example representative of the decision processes found in SHM.

下载PDF全文

下载文献需遵守相关版权规定

论文标题