UIA-VIT：基于视觉变压器的面部伪造检测的无监督不一致的方法

论文标题

UIA-VIT：基于视觉变压器的面部伪造检测的无监督不一致的方法

UIA-ViT: Unsupervised Inconsistency-Aware Method based on Vision Transformer for Face Forgery Detection

论文作者

Zhuang, Wanyi, Chu, Qi, Tan, Zhentao, Liu, Qiankun, Yuan, Haojie, Miao, Changtao, Luo, Zixiang, Yu, Nenghai

论文摘要

事实证明，框内不一致对面部伪造检测有效。但是，学习专注于这些不一致需要额外的像素级锻造位置注释。获取此类注释是不平凡的。一些现有方法生成具有位置注释的大规模合成数据，该数据仅由真实图像组成，无法捕获伪造区的特性。其他人则通过减去配对的真实图像和虚假图像来产生伪造的位置标签，但是很难收集这些配对数据，并且生成的标签通常是不连续的。为了克服这些局限性，我们提出了一种基于Vision Transformer（UIA-VIT）的新型无监督不一致的方法，该方法仅利用视频级别的标签，并且可以学习没有像素级注释的不一致的功能。由于自我发挥的机制，斑块嵌入之间的注意力图自然代表了一致性关系，使视觉变压器适合一致性表示学习。根据视觉变压器，我们提出了两个关键组成部分：无监督的斑块一致性学习（UPCL）和渐进的一致性加权组装（PCWA）。 UPCL设计用于学习与一致性相关的表示，并具有渐进优化的伪注释。 PCWA增强了最终分类的嵌入，其先前的补丁嵌入了UPCL优化的嵌入，以进一步提高检测性能。广泛的实验证明了该方法的有效性。

Intra-frame inconsistency has been proved to be effective for the generalization of face forgery detection. However, learning to focus on these inconsistency requires extra pixel-level forged location annotations. Acquiring such annotations is non-trivial. Some existing methods generate large-scale synthesized data with location annotations, which is only composed of real images and cannot capture the properties of forgery regions. Others generate forgery location labels by subtracting paired real and fake images, yet such paired data is difficult to collected and the generated label is usually discontinuous. To overcome these limitations, we propose a novel Unsupervised Inconsistency-Aware method based on Vision Transformer, called UIA-ViT, which only makes use of video-level labels and can learn inconsistency-aware feature without pixel-level annotations. Due to the self-attention mechanism, the attention map among patch embeddings naturally represents the consistency relation, making the vision Transformer suitable for the consistency representation learning. Based on vision Transformer, we propose two key components: Unsupervised Patch Consistency Learning (UPCL) and Progressive Consistency Weighted Assemble (PCWA). UPCL is designed for learning the consistency-related representation with progressive optimized pseudo annotations. PCWA enhances the final classification embedding with previous patch embeddings optimized by UPCL to further improve the detection performance. Extensive experiments demonstrate the effectiveness of the proposed method.

下载PDF全文

下载文献需遵守相关版权规定

论文标题