论文标题
一项有关蒙面自动编码器的调查
A Survey on Masked Autoencoder for Self-supervised Learning in Vision and Beyond
论文作者
论文摘要
蒙面的自动编码器是可扩展的视觉学习者,因为Mae \ cite {He2022masked}的标题表明,视觉中的自我监督学习(SSL)可能会采用与NLP相似的轨迹。具体而言,具有蒙版预测(例如BERT)的生成借口任务已成为NLP中事实上的标准SSL实践。相比之下,早期尝试在视力中进行生成方法已被其歧视性对应物所掩埋(例如对比度学习)。但是,蒙版图像建模的成功恢复了屏蔽自动编码器(过去通常被称为DeNoing AutoCododer)。作为在NLP中与Bert弥合差距的里程碑,蒙面自动编码器吸引了对SSL在视觉及其他方面的前所未有的关注。这项工作对蒙面自动编码器进行了全面的调查,以洞悉SSL有希望的方向。作为第一个使用蒙版自动编码器审查SSL的人,这项工作将重点放在其在视觉中的应用,通过讨论其历史发展,最新进度以及对各种应用的影响。
Masked autoencoders are scalable vision learners, as the title of MAE \cite{he2022masked}, which suggests that self-supervised learning (SSL) in vision might undertake a similar trajectory as in NLP. Specifically, generative pretext tasks with the masked prediction (e.g., BERT) have become a de facto standard SSL practice in NLP. By contrast, early attempts at generative methods in vision have been buried by their discriminative counterparts (like contrastive learning); however, the success of mask image modeling has revived the masking autoencoder (often termed denoising autoencoder in the past). As a milestone to bridge the gap with BERT in NLP, masked autoencoder has attracted unprecedented attention for SSL in vision and beyond. This work conducts a comprehensive survey of masked autoencoders to shed insight on a promising direction of SSL. As the first to review SSL with masked autoencoders, this work focuses on its application in vision by discussing its historical developments, recent progress, and implications for diverse applications.