Sfusion：基于自我注意的n对一对多模式融合块

论文标题

Sfusion：基于自我注意的n对一对多模式融合块

SFusion: Self-attention based N-to-One Multimodal Fusion Block

论文作者

Liu, Zecheng, Wei, Jia, Li, Rui, Zhou, Jianlong

论文摘要

人们以不同的感官感知世界，例如视觉，听觉，闻到和触摸。从多种模式的处理和融合信息使人工智能可以更轻松地了解周围的世界。但是，当缺少模式时，在不同情况下，可用方式的数量会不同，这导致了N至一对融合问题。为了解决这个问题，我们提出了一个基于自我注意的融合块，称为sfusion。与预设配方或基于卷积的方法不同，所提出的块自动学习以融合可用的方式而无需合成或零填充丢失。具体而言，从上游处理模型中提取的特征表示形式被投影为令牌，并将其输入自我发项模块以生成潜在的多模式相关性。然后，引入了一种模态注意机制来构建共享表示形式，该表示可以由下游决策模型应用。提出的SFusion可以轻松地集成到现有的多模式分析网络中。在这项工作中，我们将sfusion应用于不同的骨干网络，以进行人类活动识别和脑肿瘤分割任务。广泛的实验结果表明，与竞争的融合策略相比，Sfusion阻滞的性能更好。我们的代码可在https://github.com/scut-cszcl/sfusion上找到。

People perceive the world with different senses, such as sight, hearing, smell, and touch. Processing and fusing information from multiple modalities enables Artificial Intelligence to understand the world around us more easily. However, when there are missing modalities, the number of available modalities is different in diverse situations, which leads to an N-to-One fusion problem. To solve this problem, we propose a self-attention based fusion block called SFusion. Different from preset formulations or convolution based methods, the proposed block automatically learns to fuse available modalities without synthesizing or zero-padding missing ones. Specifically, the feature representations extracted from upstream processing model are projected as tokens and fed into self-attention module to generate latent multimodal correlations. Then, a modal attention mechanism is introduced to build a shared representation, which can be applied by the downstream decision model. The proposed SFusion can be easily integrated into existing multimodal analysis networks. In this work, we apply SFusion to different backbone networks for human activity recognition and brain tumor segmentation tasks. Extensive experimental results show that the SFusion block achieves better performance than the competing fusion strategies. Our code is available at https://github.com/scut-cszcl/SFusion.

下载PDF全文

下载文献需遵守相关版权规定

论文标题