论文标题
解释多模式可恨模因检测模型
On Explaining Multimodal Hateful Meme Detection Models
论文作者
论文摘要
仇恨的模因检测是一项新的多模式任务,在学术和行业研究社区中获得了重大吸引力。最近,研究人员已经应用了预训练的视觉语言模型来执行多模式分类任务,其中一些解决方案产生了令人鼓舞的结果。但是,这些视觉语言模型对可恶的模因分类任务所学的内容尚不清楚。例如,目前尚不清楚这些模型是否能够在可恨模因的多模式(即图像和文本)中捕获贬义或诽谤的引用。为了填补这一研究差距,本文提出了三个研究问题,以提高我们对执行可恨模因分类任务的视觉语言模型的理解。我们发现,图像模式对可恶的模因分类任务有更多的贡献,并且视觉语言模型能够在一定程度上进行视觉文本诽谤。我们的错误分析还表明,视觉语言模型已经获得了偏见,从而产生了假阳性预测。
Hateful meme detection is a new multimodal task that has gained significant traction in academic and industry research communities. Recently, researchers have applied pre-trained visual-linguistic models to perform the multimodal classification task, and some of these solutions have yielded promising results. However, what these visual-linguistic models learn for the hateful meme classification task remains unclear. For instance, it is unclear if these models are able to capture the derogatory or slurs references in multimodality (i.e., image and text) of the hateful memes. To fill this research gap, this paper propose three research questions to improve our understanding of these visual-linguistic models performing the hateful meme classification task. We found that the image modality contributes more to the hateful meme classification task, and the visual-linguistic models are able to perform visual-text slurs grounding to a certain extent. Our error analysis also shows that the visual-linguistic models have acquired biases, which resulted in false-positive predictions.