蒙版重建对比度学习与信息瓶颈原则

论文标题

蒙版重建对比度学习与信息瓶颈原则

Masked Reconstruction Contrastive Learning with Information Bottleneck Principle

论文作者

Liu, Ziwen, Li, Bonan, Han, Congying, Guo, Tiande, Nie, Xuecheng

论文摘要

对比度学习（CL）由于能够捕获大规模数据之间的洞察力相关性，因此在自学学习中表现出了巨大的力量。当前的CL模型仅因歧视性任务设置而仅学习区分正面和负面对的能力。但是，这种偏见会导致忽略其对其他下游任务的充分性，我们称这是歧视性信息过于拟合的问题。在本文中，我们建议从信息瓶颈（IB）原理的方面解决上述问题，进一步推动CL的前沿。具体而言，我们提出了一种新的观点，即CL是IB原理的实例化，包括信息压缩和表达。我们从理论上分析了最佳信息情况，并证明最小足够的增强和信息代表表示是实现下游任务的最大压缩和推广性的最佳要求。因此，我们提出了掩盖的重建对比度学习〜（MRCL）模型，以改善CL模型。对于实践实施，MRCL利用掩盖操作来实现更强大的增强，从而进一步消除了冗余和嘈杂的信息。为了有效地减轻歧视性信息过度拟合问题，我们采用重建任务来规范判别任务。我们进行全面的实验，并在多个任务上显示了所提出的模型的优越性，包括图像分类，语义分割和客观检测。

Contrastive learning (CL) has shown great power in self-supervised learning due to its ability to capture insight correlations among large-scale data. Current CL models are biased to learn only the ability to discriminate positive and negative pairs due to the discriminative task setting. However, this bias would lead to ignoring its sufficiency for other downstream tasks, which we call the discriminative information overfitting problem. In this paper, we propose to tackle the above problems from the aspect of the Information Bottleneck (IB) principle, further pushing forward the frontier of CL. Specifically, we present a new perspective that CL is an instantiation of the IB principle, including information compression and expression. We theoretically analyze the optimal information situation and demonstrate that minimum sufficient augmentation and information-generalized representation are the optimal requirements for achieving maximum compression and generalizability to downstream tasks. Therefore, we propose the Masked Reconstruction Contrastive Learning~(MRCL) model to improve CL models. For implementation in practice, MRCL utilizes the masking operation for stronger augmentation, further eliminating redundant and noisy information. In order to alleviate the discriminative information overfitting problem effectively, we employ the reconstruction task to regularize the discriminative task. We conduct comprehensive experiments and show the superiority of the proposed model on multiple tasks, including image classification, semantic segmentation and objective detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题