现实世界中的多域数据应用程序用于临床环境的概括

论文标题

现实世界中的多域数据应用程序用于临床环境的概括

Real-World Multi-Domain Data Applications for Generalizations to Clinical Settings

论文作者

Mojab, Nooshin, Noroozi, Vahid, Yi, Darvin, Nallabothula, Manoj Prabhakar, Aleem, Abdullah, Yu, Phillip S., Hallak, Joelle A.

论文摘要

随着计算机视觉中基于机器学习的模型的有希望的结果，医学成像数据的应用程序呈指数增长。但是，对复杂的现实世界临床数据的概括是一个持续的问题。深度学习模型在通过人工设置（例如临床试验）对标准化数据集进行培训时，表现良好。但是，现实世界数据是不同的，翻译产生了不同的结果。实际应用在医疗保健中应用的复杂性可能会从多个设备域的不同数据分布的混合物以及来自不同图像分辨率，人体错误以及缺乏手动级别的不可避免的噪声以及不可避免的噪声中散发出来。此外，医疗保健应用程序不仅遭受了标记数据的稀缺性，而且由于HIPAA法规，患者隐私，数据所有权的歧义以及从不同来源收集数据时的挑战，还面临着有限的未标记数据访问权限。这些局限性在医疗保健和临床翻译中应用深度学习算法提出了其他挑战。在本文中，我们利用自我监督的表示方法在转移学习设置中有效地提出，以解决有限的数据可用性。我们的实验验证了各种现实数据对临床环境的概括的重要性。我们表明，通过在多域现实世界数据集中采用自我监督的方法通过转移学习，我们可以在标准化数据集上对监督基线实现16％的相对改进。

With promising results of machine learning based models in computer vision, applications on medical imaging data have been increasing exponentially. However, generalizations to complex real-world clinical data is a persistent problem. Deep learning models perform well when trained on standardized datasets from artificial settings, such as clinical trials. However, real-world data is different and translations are yielding varying results. The complexity of real-world applications in healthcare could emanate from a mixture of different data distributions across multiple device domains alongside the inevitable noise sourced from varying image resolutions, human errors, and the lack of manual gradings. In addition, healthcare applications not only suffer from the scarcity of labeled data, but also face limited access to unlabeled data due to HIPAA regulations, patient privacy, ambiguity in data ownership, and challenges in collecting data from different sources. These limitations pose additional challenges to applying deep learning algorithms in healthcare and clinical translations. In this paper, we utilize self-supervised representation learning methods, formulated effectively in transfer learning settings, to address limited data availability. Our experiments verify the importance of diverse real-world data for generalization to clinical settings. We show that by employing a self-supervised approach with transfer learning on a multi-domain real-world dataset, we can achieve 16% relative improvement on a standardized dataset over supervised baselines.

下载PDF全文

下载文献需遵守相关版权规定

论文标题