通过验证视觉语言模型的医学形象理解：一项全面研究

论文标题

通过验证视觉语言模型的医学形象理解：一项全面研究

Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study

论文作者

Qin, Ziyuan, Yi, Huahui, Lao, Qicheng, Li, Kang

论文摘要

大规模的预训练视觉语言模型（VLM）在自然图像上显示出显着的域传输能力。但是，该功能是否也适用于医疗图像域，尚不清楚。本文彻底研究了预训练的VLMS到医疗领域的知识转移性，我们表明精心设计的医疗提示是从预先训练的VLM中引起知识的关键。我们证明，通过提示域之间共享的表达属性，VLM可以跨域携带知识并改善其概括。这种机制使VLM识别具有更少或没有图像样本的新物体。此外，为了避免使用费力的手动设计过程，我们开发了三种方法来自动生成医疗提示，这些方法可以将专家级的医学知识和特定于图像的信息注入到良好的接地提示中。我们对各种方式进行了13个不同的医疗数据集进行了广泛的实验，这表明与默认提示相比，我们精心设计的提示大大提高了零弹性的性能，而我们的微型模型则超过了监督模型的大幅度。

The large-scale pre-trained vision language models (VLM) have shown remarkable domain transfer capability on natural images. However, it remains unknown whether this capability can also apply to the medical image domain. This paper thoroughly studies the knowledge transferability of pre-trained VLMs to the medical domain, where we show that well-designed medical prompts are the key to elicit knowledge from pre-trained VLMs. We demonstrate that by prompting with expressive attributes that are shared between domains, the VLM can carry the knowledge across domains and improve its generalization. This mechanism empowers VLMs to recognize novel objects with fewer or without image samples. Furthermore, to avoid the laborious manual designing process, we develop three approaches for automatic generation of medical prompts, which can inject expert-level medical knowledge and image-specific information into the prompts for fine-grained grounding. We conduct extensive experiments on thirteen different medical datasets across various modalities, showing that our well-designed prompts greatly improve the zero-shot performance compared to the default prompts, and our fine-tuned models surpass the supervised models by a significant margin.

下载PDF全文

下载文献需遵守相关版权规定

论文标题