探索用于深泡检测和本地化的时空特征

论文标题

探索用于深泡检测和本地化的时空特征

Exploring Spatial-Temporal Features for Deepfake Detection and Localization

论文作者

Haiwei, Wu, Jiantao, Zhou, Shile, Zhang, Jinyu, Tian

论文摘要

随着对深层取证的持续研究，除了视频级别的粗分类外，最近的研究还试图提供伪造的细粒度定位。但是，现有的DeepFake法医方法的检测和本地化性能仍然有足够的进一步改进的空间。在这项工作中，我们提出了一个时空的深层检测和定位（ST-DDL）网络，该网络同时探讨了用于检测和定位锻造区域的空间和时间特征。具体而言，我们设计了一种新的锚网运动（AMM）算法，以通过对面部微表达的精确几何运动进行建模来提取时间（运动）特征。与旨在模拟大型物体的传统运动提取方法（例如光流）相比，我们提出的AMM可以更好地捕获小置换的面部特征。然后，基于最终DeepFake法医任务的变压器体系结构，将时间特征和空间特征融合在融合注意力（FA）模块中。通过视频和像素级检测和本地化性能，通过与几个最先进的竞争对手的实验比较来验证我们的ST-DDL网络的优势。此外，为了促进Deepfake取证的未来开发，我们构建了一个由6000个视频组成的公共伪造数据集，其中许多新功能，例如使用广泛使用的商业软件（例如，效果）来制作，提供在线社交网络传输版本，并拼凑多源视频。源代码和数据集可在https://github.com/highwaywu/st-ddl上找到。

With the continuous research on Deepfake forensics, recent studies have attempted to provide the fine-grained localization of forgeries, in addition to the coarse classification at the video-level. However, the detection and localization performance of existing Deepfake forensic methods still have plenty of room for further improvement. In this work, we propose a Spatial-Temporal Deepfake Detection and Localization (ST-DDL) network that simultaneously explores spatial and temporal features for detecting and localizing forged regions. Specifically, we design a new Anchor-Mesh Motion (AMM) algorithm to extract temporal (motion) features by modeling the precise geometric movements of the facial micro-expression. Compared with traditional motion extraction methods (e.g., optical flow) designed to simulate large-moving objects, our proposed AMM could better capture the small-displacement facial features. The temporal features and the spatial features are then fused in a Fusion Attention (FA) module based on a Transformer architecture for the eventual Deepfake forensic tasks. The superiority of our ST-DDL network is verified by experimental comparisons with several state-of-the-art competitors, in terms of both video- and pixel-level detection and localization performance. Furthermore, to impel the future development of Deepfake forensics, we build a public forgery dataset consisting of 6000 videos, with many new features such as using widely-used commercial software (e.g., After Effects) for the production, providing online social networks transmitted versions, and splicing multi-source videos. The source code and dataset are available at https://github.com/HighwayWu/ST-DDL.

下载PDF全文

下载文献需遵守相关版权规定

论文标题