论文标题
语音面同质性告诉deepfake
Voice-Face Homogeneity Tells Deepfake
论文作者
论文摘要
由于滥用了深层,检测伪造视频是非常可取的。现有的检测方法有助于探索DeepFake视频中的特定工件,并且非常适合某些数据。但是,对这些文物的不断增长的技术一直在挑战传统的深层探测器的鲁棒性。结果,这些方法的普遍性的发展已经达到阻塞。为了解决这个问题,鉴于经验结果是,深层视频中的声音和面部背后的身份通常是不匹配的,并且声音和面孔在某种程度上具有同质性,在本文中,我们建议从无法确定的语音贴边匹配的视图中执行深板检测。为此,设计了一种语音匹配方法来测量这两个方法的匹配度。然而,对特定的深层数据集进行的培训使模型过于拟合深层算法的某些特征。相反,我们提倡一种迅速适应未开发的伪造方法的方法,然后预先培训然后进行微调范式。具体来说,我们首先在通用音频视频数据集上预先培训该模型,然后在下游DeepFake数据上进行微调。我们对三个广泛利用的DeepFake数据集进行了广泛的实验-DFDC,Fakeavceleb和DeepFaketimit。与其他最先进的竞争对手相比,我们的方法获得了显着的性能增长。还值得注意的是,在有限的深质数据数据上进行微调时,我们的方法已经取得了竞争性结果。
Detecting forgery videos is highly desirable due to the abuse of deepfake. Existing detection approaches contribute to exploring the specific artifacts in deepfake videos and fit well on certain data. However, the growing technique on these artifacts keeps challenging the robustness of traditional deepfake detectors. As a result, the development of generalizability of these approaches has reached a blockage. To address this issue, given the empirical results that the identities behind voices and faces are often mismatched in deepfake videos, and the voices and faces have homogeneity to some extent, in this paper, we propose to perform the deepfake detection from an unexplored voice-face matching view. To this end, a voice-face matching method is devised to measure the matching degree of these two. Nevertheless, training on specific deepfake datasets makes the model overfit certain traits of deepfake algorithms. We instead, advocate a method that quickly adapts to untapped forgery, with a pre-training then fine-tuning paradigm. Specifically, we first pre-train the model on a generic audio-visual dataset, followed by the fine-tuning on downstream deepfake data. We conduct extensive experiments over three widely exploited deepfake datasets - DFDC, FakeAVCeleb, and DeepfakeTIMIT. Our method obtains significant performance gains as compared to other state-of-the-art competitors. It is also worth noting that our method already achieves competitive results when fine-tuned on limited deepfake data.