论文标题
曲线最大化模型下的分布稳健区域
A Distributionally Robust Area Under Curve Maximization Model
论文作者
论文摘要
ROC曲线(AUC)下的面积是分类模型的广泛使用的性能度量。我们提出了依赖Kantorovich度量的两个新的分布稳健的AUC最大化模型(DR-AUC),并以铰链损耗函数近似AUC。我们考虑了两种情况,分别对最坏情况分布的固定和可变支持。我们使用二元理论来重新制定DR-AUC模型并得出可拖动的凸优化问题。数值实验表明,所提出的DR-AUC模型(以标准确定性AUC和支持向量机模型为基准)总体上表现更好,尤其是在大多数考虑的数据集中提高了最差的样本外部性能,从而显示出其稳健性。结果特别令人鼓舞,因为我们的数值实验是通过较小尺寸的训练集进行的,这些训练集已知有利于较低的样本外部性能。
Area under ROC curve (AUC) is a widely used performance measure for classification models. We propose two new distributionally robust AUC maximization models (DR-AUC) that rely on the Kantorovich metric and approximate the AUC with the hinge loss function. We consider the two cases with respectively fixed and variable support for the worst-case distribution. We use duality theory to reformulate the DR-AUC models and derive tractable convex optimization problems. The numerical experiments show that the proposed DR-AUC models -- benchmarked with the standard deterministic AUC and the support vector machine models - perform better in general and in particular improve the worst-case out-of-sample performance over the majority of the considered datasets, thereby showing their robustness. The results are particularly encouraging since our numerical experiments are conducted with training sets of small size which have been known to be conducive to low out-of-sample performance.