论文标题
诊断:避免使用supsodular信息测量的分布数据
DIAGNOSE: Avoiding Out-of-distribution Data using Submodular Information Measures
论文作者
论文摘要
避免分布(OOD)数据对于训练医学成像域中的机器学习模型至关重要。此外,获得标记的医学数据是困难且昂贵的,因为它需要医生,放射学家等专家注释者。主动学习(AL)是一种众所周知的方法,可以通过选择最多样化或不确定的样本来减轻标签成本。但是,当前的AL方法在使用OOD数据的医学成像域中无法很好地工作。我们提出了诊断(避免使用suppodular信息测量方法避免分布数据),这是一个新型的主动学习框架,可以共同建模相似性和相似性,这对于挖掘分布数据和同时避免使用OOD数据至关重要。特别是,我们使用少量数据点作为代表一组分布数据点的查询集和一组私有的OOD数据点的示例。我们通过在各种现实的OOD场景上对框架进行评估来说明我们的框架的普遍性。我们的实验验证了跨多个医学成像领域的诊断优于最先进的方法的优势。
Avoiding out-of-distribution (OOD) data is critical for training supervised machine learning models in the medical imaging domain. Furthermore, obtaining labeled medical data is difficult and expensive since it requires expert annotators like doctors, radiologists, etc. Active learning (AL) is a well-known method to mitigate labeling costs by selecting the most diverse or uncertain samples. However, current AL methods do not work well in the medical imaging domain with OOD data. We propose Diagnose (avoiDing out-of-dIstribution dAta usinG submodular iNfOrmation meaSurEs), a novel active learning framework that can jointly model similarity and dissimilarity, which is crucial in mining in-distribution data and avoiding OOD data at the same time. Particularly, we use a small number of data points as exemplars that represent a query set of in-distribution data points and a private set of OOD data points. We illustrate the generalizability of our framework by evaluating it on a wide variety of real-world OOD scenarios. Our experiments verify the superiority of Diagnose over the state-of-the-art AL methods across multiple domains of medical imaging.