论文标题
关于机器学习中概括性范围的高概率与预测保证的注释
A Note on High-Probability versus In-Expectation Guarantees of Generalization Bounds in Machine Learning
论文作者
论文摘要
统计机器学习理论通常试图提供机器学习模型的概括保证。这些模型自然是基于数据样本的一些波动的基础。如果我们不幸,并收集了不代表基本分布的样本,那么人们就无法期望构建可靠的机器学习模型。在此之后,关于机器学习模型的性能的陈述必须考虑到采样过程。这两种常见的方法是生成在随机抽样过程中具有高概率或不指望的语句。在此简短说明中,我们显示了一个人如何将一个陈述转换为另一种陈述。作为技术新颖性,我们解决了无界损失函数的情况,在那里我们使用了一个相当新的假设,称为证人条件。
Statistical machine learning theory often tries to give generalization guarantees of machine learning models. Those models naturally underlie some fluctuation, as they are based on a data sample. If we were unlucky, and gathered a sample that is not representative of the underlying distribution, one cannot expect to construct a reliable machine learning model. Following that, statements made about the performance of machine learning models have to take the sampling process into account. The two common approaches for that are to generate statements that hold either in high-probability, or in-expectation, over the random sampling process. In this short note we show how one may transform one statement to another. As a technical novelty we address the case of unbounded loss function, where we use a fairly new assumption, called the witness condition.