论文标题

人类对成绩单,音频和视频中政治言论深击的发现

Human Detection of Political Speech Deepfakes across Transcripts, Audio, and Video

论文作者

Groh, Matthew, Sankaranarayanan, Aruna, Singh, Nikhil, Kim, Dong Young, Lippman, Andrew, Picard, Rosalind

论文摘要

超现实视觉和音频效果技术的最新进展引起了人们的担忧,即政治演讲的深击视频很快将与真实的视频录制无法区分。传统的传播理论中的传统观念预测,当相同版本的故事被视为视频与文本时,人们会更频繁地选择假新闻。我们对2,215名参与者进行了5次预注册的随机实验,以评估人类如何准确地区分实际政治演讲与跨基本错误信息率,音频来源,问题框架和媒体方式的制造。我们发现,错误信息的基本速率极大地影响了辨别力和深层效果,而最先进的文本到言论算法产生的音频比与配音演员音频的同一deepfakes更难辨别。此外,在所有实验中,我们发现音频和视觉信息比单独的文本更准确地辨别:人类的辨别力更多地依赖于话语,视听性提示,而不是所说的语音内容。

Recent advances in technology for hyper-realistic visual and audio effects provoke the concern that deepfake videos of political speeches will soon be indistinguishable from authentic video recordings. The conventional wisdom in communication theory predicts people will fall for fake news more often when the same version of a story is presented as a video versus text. We conduct 5 pre-registered randomized experiments with 2,215 participants to evaluate how accurately humans distinguish real political speeches from fabrications across base rates of misinformation, audio sources, question framings, and media modalities. We find base rates of misinformation minimally influence discernment and deepfakes with audio produced by the state-of-the-art text-to-speech algorithms are harder to discern than the same deepfakes with voice actor audio. Moreover across all experiments, we find audio and visual information enables more accurate discernment than text alone: human discernment relies more on how something is said, the audio-visual cues, than what is said, the speech content.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源