SHIFTS 2.0：扩展实际分布偏移数据集

论文标题

SHIFTS 2.0：扩展实际分布偏移数据集

Shifts 2.0: Extending The Dataset of Real Distributional Shifts

论文作者

Malinin, Andrey, Athanasopoulos, Andreas, Barakovic, Muhamed, Cuadra, Meritxell Bach, Gales, Mark J. F., Granziera, Cristina, Graziani, Mara, Kartashev, Nikolay, Kyriakopoulos, Konstantinos, Lu, Po-Jui, Molchanova, Nataliia, Nikitakis, Antonis, Raina, Vatsal, La Rosa, Francesco, Sivena, Eli, Tsarsitalidis, Vasileios, Tsompopoulou, Efi, Volf, Elena

论文摘要

分配转移或培训数据和部署数据之间的不匹配是在高风险工业应用中使用机器学习的重要障碍，例如自主驾驶和医学。这需要能够评估ML模型概括以及其不确定性估计的质量的鲁棒性ML模型。标准ML基线数据集不允许评估这些属性，因为培训，验证和测试数据通常相同分布。最近，已经出现了一系列专用基准测试，这些基准均具有匹配的分布和转移数据。在这些基准测试中，数据集在任务的多样性以及其功能的数据模式方面脱颖而出。虽然大多数基准测试由2D图像分类任务主导，但Shifts包含表格的天气预测，机器翻译和车辆运动预测任务。这使得可以评估模型的鲁棒性特性，以及要得出的多样化的工业规模任务以及通用或直接适用的特定任务结论。在本文中，我们扩展了偏移数据集，其中两个数据集来自具有高社会重要性的工业高风险应用程序。具体而言，我们考虑了3D磁共振脑图像中白质多发性硬化病变的分割任务以及海洋货物容器中功耗的估计。由于错误成本高昂，这两个任务都有无处不在的分配变化和严格的安全要求。这些新数据集将使研究人员能够进一步探索新情况下强大的概括和不确定性估计。在这项工作中，我们为这两个任务提供了数据集和基线结果的描述。

Distributional shift, or the mismatch between training and deployment data, is a significant obstacle to the usage of machine learning in high-stakes industrial applications, such as autonomous driving and medicine. This creates a need to be able to assess how robustly ML models generalize as well as the quality of their uncertainty estimates. Standard ML baseline datasets do not allow these properties to be assessed, as the training, validation and test data are often identically distributed. Recently, a range of dedicated benchmarks have appeared, featuring both distributionally matched and shifted data. Among these benchmarks, the Shifts dataset stands out in terms of the diversity of tasks as well as the data modalities it features. While most of the benchmarks are heavily dominated by 2D image classification tasks, Shifts contains tabular weather forecasting, machine translation, and vehicle motion prediction tasks. This enables the robustness properties of models to be assessed on a diverse set of industrial-scale tasks and either universal or directly applicable task-specific conclusions to be reached. In this paper, we extend the Shifts Dataset with two datasets sourced from industrial, high-risk applications of high societal importance. Specifically, we consider the tasks of segmentation of white matter Multiple Sclerosis lesions in 3D magnetic resonance brain images and the estimation of power consumption in marine cargo vessels. Both tasks feature ubiquitous distributional shifts and a strict safety requirement due to the high cost of errors. These new datasets will allow researchers to further explore robust generalization and uncertainty estimation in new situations. In this work, we provide a description of the dataset and baseline results for both tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题