多模式信息瓶颈：学习最小的足够单峰和多模式表示

论文标题

多模式信息瓶颈：学习最小的足够单峰和多模式表示

Multimodal Information Bottleneck: Learning Minimal Sufficient Unimodal and Multimodal Representations

论文作者

Mai, Sijie, Zeng, Ying, Hu, Haifeng

论文摘要

学习有效的跨模式数据的关节嵌入一直是多模式机器学习领域的重点。我们认为，在多模式融合期间，生成的多模式嵌入可能是多余的，并且可以忽略判别的单峰信息，这通常会干扰准确的预测并导致更高的过度拟合风险。此外，单峰表示形式还包含嘈杂的信息，这些信息会对跨模式动态的学习产生负面影响。为此，我们介绍了多模式信息瓶颈（MIB），旨在学习一种强大而充分的多模式表示，该表示没有冗余，并在单峰表示中滤除了嘈杂的信息。具体而言，MIB从一般信息瓶颈（IB）继承，旨在通过最大程度地提高表示和目标之间的相互信息，并同时限制表示表示和输入数据之间的相互信息，从而了解给定任务的足够代表。与将军IB不同，我们的MIB同时将多模式和单峰表示形式规范化，这是一个综合且灵活的框架，与任何融合方法兼容。我们开发了三种MIB变体，即早期融合MIB，晚融合MIB和完整的MIB，以专注于信息约束的不同观点。实验结果表明，该方法在三个广泛使用的数据集中的多模式情感分析和多模式情绪识别的任务上达到了最先进的表现。这些代码可在\ url {https://github.com/tmacmai/multimodal-information-bottleneck}中获得。

Learning effective joint embedding for cross-modal data has always been a focus in the field of multimodal machine learning. We argue that during multimodal fusion, the generated multimodal embedding may be redundant, and the discriminative unimodal information may be ignored, which often interferes with accurate prediction and leads to a higher risk of overfitting. Moreover, unimodal representations also contain noisy information that negatively influences the learning of cross-modal dynamics. To this end, we introduce the multimodal information bottleneck (MIB), aiming to learn a powerful and sufficient multimodal representation that is free of redundancy and to filter out noisy information in unimodal representations. Specifically, inheriting from the general information bottleneck (IB), MIB aims to learn the minimal sufficient representation for a given task by maximizing the mutual information between the representation and the target and simultaneously constraining the mutual information between the representation and the input data. Different from general IB, our MIB regularizes both the multimodal and unimodal representations, which is a comprehensive and flexible framework that is compatible with any fusion methods. We develop three MIB variants, namely, early-fusion MIB, late-fusion MIB, and complete MIB, to focus on different perspectives of information constraints. Experimental results suggest that the proposed method reaches state-of-the-art performance on the tasks of multimodal sentiment analysis and multimodal emotion recognition across three widely used datasets. The codes are available at \url{https://github.com/TmacMai/Multimodal-Information-Bottleneck}.

下载PDF全文

下载文献需遵守相关版权规定

论文标题