论文标题
攻击者归因于音频深击
Attacker Attribution of Audio Deepfakes
论文作者
论文摘要
Deepfakes是合成生成的媒体,通常是通过恶意意图设计的。通过大型培训数据集高级神经网络,它们变得越来越令人信服。这些假货很容易被诽谤,错误信息和欺诈滥用。因此,开发对策的深入研究也在不断扩大。但是,最近的工作几乎仅限于DeepFake检测 - 预测音频是真实的还是假的。尽管事实是,归因(谁创建了哪个假?)是更大的防御策略的重要组成部分,这在网络安全领域长期存在。本文考虑了音频域中的DeepFake攻击者归因的问题。我们提供了几种使用低级声学描述符和机器学习嵌入的攻击者签名的方法。我们表明,语音信号功能不足以表征攻击者签名。但是,我们还证明,来自复发性神经网络的嵌入可以成功地表征已知和未知攻击者的攻击。我们的攻击签名嵌入会导致独特的簇,无论是可见的还是看不见的音频深击。我们表明,这些嵌入可以在下游任务中用于高效,在攻击者ID分类中得分为97.10%。
Deepfakes are synthetically generated media often devised with malicious intent. They have become increasingly more convincing with large training datasets advanced neural networks. These fakes are readily being misused for slander, misinformation and fraud. For this reason, intensive research for developing countermeasures is also expanding. However, recent work is almost exclusively limited to deepfake detection - predicting if audio is real or fake. This is despite the fact that attribution (who created which fake?) is an essential building block of a larger defense strategy, as practiced in the field of cybersecurity for a long time. This paper considers the problem of deepfake attacker attribution in the domain of audio. We present several methods for creating attacker signatures using low-level acoustic descriptors and machine learning embeddings. We show that speech signal features are inadequate for characterizing attacker signatures. However, we also demonstrate that embeddings from a recurrent neural network can successfully characterize attacks from both known and unknown attackers. Our attack signature embeddings result in distinct clusters, both for seen and unseen audio deepfakes. We show that these embeddings can be used in downstream-tasks to high-effect, scoring 97.10% accuracy in attacker-id classification.