论文标题
机器学习中的不确定性估计
Uncertainty Estimation in Machine Learning
论文作者
论文摘要
大多数机器学习技术基于统计学习理论,通常是为了计算速度而简化的。本文的重点是机器学习中数学建模的不确定性方面。选择回归分析以进一步研究模型系数不确定性的评估方面,更重要的是在输出特征值预测中。一项调查表明,传统最小二乘方法的主要阶段,用于创建回归模型,以及其不确定性估计。另一方面,这表明在机器中学习模型的复杂性和严重的非线性成为不确定性评估的严重障碍。此外,机器模型培训的过程需要高计算能力,在个人计算机级别上不可用。这就是为什么所谓的预训练模型被广泛用于机器学习等自然语言处理的领域的原因。预训练的模型的最新示例是生成预训练的变压器3,该变压器3具有数百亿个参数和一个半稳定的训练数据集。同样,由真实数据构建的数学模型的复杂性也在增长,伴随着越来越多的培训数据。但是,当机器模型及其预测用于决策时,需要估计不确定性并评估随附的风险。可以通过非参数技术来解决此问题,但以更大的计算能力需求为代价,这可以由现代超级计算机可用,包括那些利用图形和张量处理单元以及传统的中央处理器以及传统的中央处理器。
Most machine learning techniques are based upon statistical learning theory, often simplified for the sake of computing speed. This paper is focused on the uncertainty aspect of mathematical modeling in machine learning. Regression analysis is chosen to further investigate the evaluation aspect of uncertainty in model coefficients and, more importantly, in the output feature value predictions. A survey demonstrates major stages in the conventional least squares approach to the creation of the regression model, along with its uncertainty estimation. On the other hand, it is shown that in machine learning the model complexity and severe nonlinearity become serious obstacles to uncertainty evaluation. Furthermore, the process of machine model training demands high computing power, not available at the level of personal computers. This is why so-called pre-trained models are widely used in such areas of machine learning as natural language processing. The latest example of a pre-trained model is the Generative Pre-trained Transformer 3 with hundreds of billions of parameters and a half-terabyte training dataset. Similarly, mathematical models built from real data are growing in complexity which is accompanied by the growing amount of training data. However, when machine models and their predictions are used in decision-making, one needs to estimate uncertainty and evaluate accompanying risks. This problem could be resolved with non-parametric techniques at the expense of greater demand for computing power, which can be offered by modern supercomputers available, including those utilizing graphical and tensor processing units along with the conventional central processors.