那是错误的肺！评估和改善无监督的多模式编码器对医疗数据的可解释性

论文标题

那是错误的肺！评估和改善无监督的多模式编码器对医疗数据的可解释性

That's the Wrong Lung! Evaluating and Improving the Interpretability of Unsupervised Multimodal Encoders for Medical Data

论文作者

McInerney, Denis Jered, Young, Geoffrey, van de Meent, Jan-Willem, Wallace, Byron C.

论文摘要

在电子健康记录（EHR）上预处理多模型模型提供了一种学习表示的手段，可以通过最小的监督转移到下游任务。最近的多模式模型诱导图像区域和句子之间的软局部对齐。这是在医疗领域中特别感兴趣的，在医学领域，对齐方式可能会突出显示与自由文本中描述的特定现象相关的图像中的区域。尽管过去的工作表明可以以这种方式解释“热图”，但对这种比对的评估很少。我们将EHR的最先进的多模式（图像和文本）模型与人类注释进行比较，将图像区域与句子联系起来。我们的主要发现是，该文本对注意力的影响通常很弱或不直觉。对齐不一致地反映基本的解剖信息。此外，综合修改（例如将“左”代替”代替“右”）并不会基本影响亮点。简单的技术，例如允许模型选择不参与图像和几乎没有弹药的填充，这表明了他们在很少或没有监督下改善对准的能力。我们将代码和检查站开源。

Pretraining multimodal models on Electronic Health Records (EHRs) provides a means of learning representations that can transfer to downstream tasks with minimal supervision. Recent multimodal models induce soft local alignments between image regions and sentences. This is of particular interest in the medical domain, where alignments might highlight regions in an image relevant to specific phenomena described in free-text. While past work has suggested that attention "heatmaps" can be interpreted in this manner, there has been little evaluation of such alignments. We compare alignments from a state-of-the-art multimodal (image and text) model for EHR with human annotations that link image regions to sentences. Our main finding is that the text has an often weak or unintuitive influence on attention; alignments do not consistently reflect basic anatomical information. Moreover, synthetic modifications -- such as substituting "left" for "right" -- do not substantially influence highlights. Simple techniques such as allowing the model to opt out of attending to the image and few-shot finetuning show promise in terms of their ability to improve alignments with very little or no supervision. We make our code and checkpoints open-source.

下载PDF全文

下载文献需遵守相关版权规定

论文标题