多模式变压器是否强大到缺失模态？

论文标题

多模式变压器是否强大到缺失模态？

Are Multimodal Transformers Robust to Missing Modality?

论文作者

Ma, Mengmeng, Ren, Jian, Zhao, Long, Testuggine, Davide, Peng, Xi

论文摘要

由于缺失的方式，从现实世界中收集的多模式数据通常是不完美的。因此，对模态分配数据具有鲁棒性的多模型是高度首选的。最近，变压器模型在处理多模式数据方面取得了巨大成功。但是，现有工作仅限于建筑设计或培训前策略。很少研究变压器模型对缺失模式数据自然强大。在本文中，我们介绍了首先的工作，以全面研究在存在模态分组数据中变形金刚的行为。令人不安的是，我们发现变压器模型对缺失模态敏感，而不同的模态融合策略将显着影响鲁棒性。令我们惊讶的是，即使对于同一变压器模型，最佳融合策略也取决于数据集。在一般情况下，不存在一种普遍的策略。基于这些发现，我们提出了一种原理方法，通过自动搜索有关输入数据的最佳融合策略来改善变压器模型的鲁棒性。对三个基准测试的实验验证支持该方法的出色性能。

Multimodal data collected from the real world are often imperfect due to missing modalities. Therefore multimodal models that are robust against modal-incomplete data are highly preferred. Recently, Transformer models have shown great success in processing multimodal data. However, existing work has been limited to either architecture designs or pre-training strategies; whether Transformer models are naturally robust against missing-modal data has rarely been investigated. In this paper, we present the first-of-its-kind work to comprehensively investigate the behavior of Transformers in the presence of modal-incomplete data. Unsurprising, we find Transformer models are sensitive to missing modalities while different modal fusion strategies will significantly affect the robustness. What surprised us is that the optimal fusion strategy is dataset dependent even for the same Transformer model; there does not exist a universal strategy that works in general cases. Based on these findings, we propose a principle method to improve the robustness of Transformer models by automatically searching for an optimal fusion strategy regarding input data. Experimental validations on three benchmarks support the superior performance of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题