论文标题
暹罗过渡掩盖自动编码器作为均匀的无监督视觉异常检测器
Siamese Transition Masked Autoencoders as Uniform Unsupervised Visual Anomaly Detector
论文作者
论文摘要
在许多情况下,无监督的视觉异常检测传达了实际意义,并且由于无限制的异常定义是一项具有挑战性的任务。此外,大多数以前的方法都是特定于应用程序的,并且在应用程序方案中建立统一模型仍然无法解决。本文提出了一个新型混合框架,该框架称为暹罗过渡掩盖自动编码器(ST-MAE),以通过深度特征过渡统一处理各种视觉异常检测任务。具体而言,该提出的方法首先从预先训练的深卷积神经网络中提取层次语义特征,然后开发出特征解耦策略,以将深度特征分为两个不相关的特征贴片子集。在利用解耦功能的情况下,ST-MAE是使用暹罗编码器开发的,这些编码器在特征贴片的每个子集上运行,并执行两个子集的潜在表示过渡,以及一个轻巧的解码器,该解码器从过渡的潜在表示中重建了原始特征。最后,可以使用语义深度残差检测到异常属性。我们的深度特征过渡方案产生了一个非平凡和语义的自学任务,以提取典型的正常模式,从而可以学习统一的模型,这些模型可以很好地推广到不同的视觉异常检测任务。进行的广泛实验表明,所提出的ST-MAE方法可以在具有出色的推理效率的应用程序方面的多个基准测试上提高最新性能,这具有巨大的潜力,是无处可比性视觉异常检测的统一模型。
Unsupervised visual anomaly detection conveys practical significance in many scenarios and is a challenging task due to the unbounded definition of anomalies. Moreover, most previous methods are application-specific, and establishing a unified model for anomalies across application scenarios remains unsolved. This paper proposes a novel hybrid framework termed Siamese Transition Masked Autoencoders(ST-MAE) to handle various visual anomaly detection tasks uniformly via deep feature transition. Concretely, the proposed method first extracts hierarchical semantics features from a pre-trained deep convolutional neural network and then develops a feature decoupling strategy to split the deep features into two disjoint feature patch subsets. Leveraging the decoupled features, the ST-MAE is developed with the Siamese encoders that operate on each subset of feature patches and perform the latent representations transition of two subsets, along with a lightweight decoder that reconstructs the original feature from the transitioned latent representation. Finally, the anomalous attributes can be detected using the semantic deep feature residual. Our deep feature transition scheme yields a nontrivial and semantic self-supervisory task to extract prototypical normal patterns, which allows for learning uniform models that generalize well for different visual anomaly detection tasks. The extensive experiments conducted demonstrate that the proposed ST-MAE method can advance state-of-the-art performance on multiple benchmarks across application scenarios with a superior inference efficiency, which exhibits great potential to be the uniform model for unsupervised visual anomaly detection.