一切都多种多样：个体变异对ML鲁棒性的令人惊讶的影响

论文标题

一切都多种多样：个体变异对ML鲁棒性的令人惊讶的影响

Everything is Varied: The Surprising Impact of Individual Variation on ML Robustness in Medicine

论文作者

Campagner, Andrea, Famiglini, Lorenzo, Carobene, Anna, Cabitza, Federico

论文摘要

在医疗环境中，个体变异（IV）是指不归因于人群差异或错误，而是由于受试者内部变化而造成的，即与给定实例或测量过程有关的固有和特征模式。尽管考虑到IV对于对医学数据的正确分析至关重要，但这种不确定性及其对鲁棒性的影响到目前为止在机器学习（ML）中已经忽略了。为了填补这一空白，我们研究了静脉如何影响ML性能和概括以及如何减轻其影响。具体而言，我们为在统计学习框架中形式化IV的问题提供了一种方法上的贡献，并通过基于基于COVID-19诊断问题的最大现实世界实验室医学数据集之一的实验，我们表明：1）：1）IV在数据中的存在严重影响。 2）基于数据增强和数据不精确的高级学习策略以及适当的研究设计可以有效地提高iV的鲁棒性。我们的发现证明了正确地解释IV的关键相关性，可以在临床环境中安全部署ML。

In medical settings, Individual Variation (IV) refers to variation that is due not to population differences or errors, but rather to within-subject variation, that is the intrinsic and characteristic patterns of variation pertaining to a given instance or the measurement process. While taking into account IV has been deemed critical for proper analysis of medical data, this source of uncertainty and its impact on robustness have so far been neglected in Machine Learning (ML). To fill this gap, we look at how IV affects ML performance and generalization and how its impact can be mitigated. Specifically, we provide a methodological contribution to formalize the problem of IV in the statistical learning framework and, through an experiment based on one of the largest real-world laboratory medicine datasets for the problem of COVID-19 diagnosis, we show that: 1) common state-of-the-art ML models are severely impacted by the presence of IV in data; and 2) advanced learning strategies, based on data augmentation and data imprecisiation, and proper study designs can be effective at improving robustness to IV. Our findings demonstrate the critical relevance of correctly accounting for IV to enable safe deployment of ML in clinical settings.

下载PDF全文

下载文献需遵守相关版权规定

论文标题