论文标题

探索用于深泡检测和本地化的时空特征

Exploring Spatial-Temporal Features for Deepfake Detection and Localization

论文作者

Haiwei, Wu, Jiantao, Zhou, Shile, Zhang, Jinyu, Tian

论文摘要

随着对深层取证的持续研究,除了视频级别的粗分类外,最近的研究还试图提供伪造的细粒度定位。但是,现有的DeepFake法医方法的检测和本地化性能仍然有足够的进一步改进的空间。在这项工作中,我们提出了一个时空的深层检测和定位(ST-DDL)网络,该网络同时探讨了用于检测和定位锻造区域的空间和时间特征。具体而言,我们设计了一种新的锚网运动(AMM)算法,以通过对面部微表达的精确几何运动进行建模来提取时间(运动)特征。与旨在模拟大型物体的传统运动提取方法(例如光流)相比,我们提出的AMM可以更好地捕获小置换的面部特征。然后,基于最终DeepFake法医任务的变压器体系结构,将时间特征和空间特征融合在融合注意力(FA)模块中。通过视频和像素级检测和本地化性能,通过与几个最先进的竞争对手的实验比较来验证我们的ST-DDL网络的优势。此外,为了促进Deepfake取证的未来开发,我们构建了一个由6000个视频组成的公共伪造数据集,其中许多新功能,例如使用广泛使用的商业软件(例如,效果)来制作,提供在线社交网络传输版本,并拼凑多源视频。源代码和数据集可在https://github.com/highwaywu/st-ddl上找到。

With the continuous research on Deepfake forensics, recent studies have attempted to provide the fine-grained localization of forgeries, in addition to the coarse classification at the video-level. However, the detection and localization performance of existing Deepfake forensic methods still have plenty of room for further improvement. In this work, we propose a Spatial-Temporal Deepfake Detection and Localization (ST-DDL) network that simultaneously explores spatial and temporal features for detecting and localizing forged regions. Specifically, we design a new Anchor-Mesh Motion (AMM) algorithm to extract temporal (motion) features by modeling the precise geometric movements of the facial micro-expression. Compared with traditional motion extraction methods (e.g., optical flow) designed to simulate large-moving objects, our proposed AMM could better capture the small-displacement facial features. The temporal features and the spatial features are then fused in a Fusion Attention (FA) module based on a Transformer architecture for the eventual Deepfake forensic tasks. The superiority of our ST-DDL network is verified by experimental comparisons with several state-of-the-art competitors, in terms of both video- and pixel-level detection and localization performance. Furthermore, to impel the future development of Deepfake forensics, we build a public forgery dataset consisting of 6000 videos, with many new features such as using widely-used commercial software (e.g., After Effects) for the production, providing online social networks transmitted versions, and splicing multi-source videos. The source code and dataset are available at https://github.com/HighwayWu/ST-DDL.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源