全球合理的多任务学习模型，用于手术场景

论文标题

全球合理的多任务学习模型，用于手术场景

Global-Reasoned Multi-Task Learning Model for Surgical Scene Understanding

论文作者

Seenivasan, Lalithkumar, Mitheran, Sai, Islam, Mobarakol, Ren, Hongliang

论文摘要

全球和地方关系推理使场景能够理解模型能够执行类似人类的场景分析和理解。场景理解可以使更好的语义分割和对象对象相互作用检测。在医疗领域，强大的手术场景理解模型允许手术技能评估自动化，对外科医生的性能和手术后分析的实时监控。本文介绍了一个全球多任务外科手术场景理解模型，能够执行仪器分割和工具组织交互检测。在这里，我们将全球关系推理整合到潜在的交互空间中，并在协调空间中引入多尺度的本地（邻域）推理，以改善细分。利用多任务模型设置，通过全局推理进一步增强了交互检测中视觉语义图表网络的性能。从分割模块中的全局交互空间特征被引入图形网络中，从而可以根据节点到节点和全局交互推理来检测相互作用。我们的模型通过共享共同的模块来运行两个独立的单任务模型而降低了计算成本，这对于实际应用来说是必不可少的。使用顺序优化技术，该提出的多任务模型在MICCAI内窥镜视觉挑战2018数据集中优于其他最先进的单任务模型。此外，当使用知识蒸馏技术训练时，我们还观察到多任务模型的性能。官方代码实施可在GitHub提供。

Global and local relational reasoning enable scene understanding models to perform human-like scene analysis and understanding. Scene understanding enables better semantic segmentation and object-to-object interaction detection. In the medical domain, a robust surgical scene understanding model allows the automation of surgical skill evaluation, real-time monitoring of surgeon's performance and post-surgical analysis. This paper introduces a globally-reasoned multi-task surgical scene understanding model capable of performing instrument segmentation and tool-tissue interaction detection. Here, we incorporate global relational reasoning in the latent interaction space and introduce multi-scale local (neighborhood) reasoning in the coordinate space to improve segmentation. Utilizing the multi-task model setup, the performance of the visual-semantic graph attention network in interaction detection is further enhanced through global reasoning. The global interaction space features from the segmentation module are introduced into the graph network, allowing it to detect interactions based on both node-to-node and global interaction reasoning. Our model reduces the computation cost compared to running two independent single-task models by sharing common modules, which is indispensable for practical applications. Using a sequential optimization technique, the proposed multi-task model outperforms other state-of-the-art single-task models on the MICCAI endoscopic vision challenge 2018 dataset. Additionally, we also observe the performance of the multi-task model when trained using the knowledge distillation technique. The official code implementation is made available in GitHub.

下载PDF全文

下载文献需遵守相关版权规定

论文标题