论文标题

部分可观测时空混沌系统的无模型预测

GMML is All you Need

论文作者

Atito, Sara, Awais, Muhammad, Kittler, Josef

论文摘要

视觉变压器对计算机视觉社区产生了浓厚的兴趣,因为它们在利用上下文信息方面具有灵活性,无论它是局部局部限制的,还是远距离全球。但是,众所周知,它们是饥饿的数据。这激发了自我监督的变压器预处理的研究,该研究不需要解码标签传达的语义信息以将其链接到图像属性,而是直接着重于提取反映相似性概念的图像数据的简洁表示,并且是不变的。大多数自学习方法使用的自学习过程的关键工具是生成训练数据的多个视图以及创建借口任务,这些任务使用这些视图来定义图像相似性和数据完整性的概念。但是,这种方法缺乏提取上下文信息的自然倾向。我们提出了群体掩盖模型学习(GMML),这是一种自我监督的学习(SSL)机制,用于预处理视觉变压器,能够提取图像中所有概念中存在的上下文信息。 GMML通过操纵随机的连接令牌来实现这一目标,从而涵盖了语义概念的有意义的部分,然后从概念的可见部分中恢复了隐藏的语义信息。 GMML隐含地引入了新的数据增强过程。与大多数现有的SSL方法不同,GMML不需要动量编码器,也不依靠仔细的实现细节,例如大批次和梯度停止,这些细节都是当前大多数当前自我监督的学习技术的人工制品。源代码可公开供社区培训更大的语料库:https://github.com/sara-ahmed/gmml。

Vision transformers have generated significant interest in the computer vision community because of their flexibility in exploiting contextual information, whether it is sharply confined local, or long range global. However, they are known to be data hungry. This has motivated the research in self-supervised transformer pretraining, which does not need to decode the semantic information conveyed by labels to link it to the image properties, but rather focuses directly on extracting a concise representation of the image data that reflects the notion of similarity, and is invariant to nuisance factors. The key vehicle for the self-learning process used by the majority of self-learning methods is the generation of multiple views of the training data and the creation of pretext tasks which use these views to define the notion of image similarity, and data integrity. However, this approach lacks the natural propensity to extract contextual information. We propose group masked model learning (GMML), a self-supervised learning (SSL) mechanism for pretraining vision transformers with the ability to extract the contextual information present in all the concepts in an image. GMML achieves this by manipulating randomly groups of connected tokens, ensuingly covering a meaningful part of a semantic concept, and then recovering the hidden semantic information from the visible part of the concept. GMML implicitly introduces a novel data augmentation process. Unlike most of the existing SSL approaches, GMML does not require momentum encoder, nor rely on careful implementation details such as large batches and gradient stopping, which are all artefacts of most of the current self-supervised learning techniques. The source code is publicly available for the community to train on bigger corpora: https://github.com/Sara-Ahmed/GMML.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源