用知识增强语言模型评估假新闻检测

论文标题

用知识增强语言模型评估假新闻检测

Evaluation of Fake News Detection with Knowledge-Enhanced Language Models

论文作者

Whitehouse, Chenxi, Weyde, Tillman, Madhyastha, Pranava, Komninos, Nikos

论文摘要

假新闻检测的最新进展利用了大规模训练的语言模型（PLM）的成功。最先进的方法是基于标记为假新闻数据集的微调PLM。但是，通常没有对结构化的事实数据进行大规模PLM，因此可能不具备基于事实准确的知识的先验。因此，现有知识库（KB）与丰富的人类策划的事实信息相关的使用有可能使假新闻检测更加有效和强大。在本文中，我们调查了知识集成到PLM中以进行虚假新闻检测的影响。我们在两个流行的假新闻数据集中研究了几种知识整合的最先进方法，主要是将Wikidata用作KB。我们的实验表明，知识增强的模型可以显着改善KB相关且最新的骗子的假新闻检测。 COVID-19上的混合结果强调了对风格特征的依赖以及域特异性和当前KB的重要性。

Recent advances in fake news detection have exploited the success of large-scale pre-trained language models (PLMs). The predominant state-of-the-art approaches are based on fine-tuning PLMs on labelled fake news datasets. However, large-scale PLMs are generally not trained on structured factual data and hence may not possess priors that are grounded in factually accurate knowledge. The use of existing knowledge bases (KBs) with rich human-curated factual information has thus the potential to make fake news detection more effective and robust. In this paper, we investigate the impact of knowledge integration into PLMs for fake news detection. We study several state-of-the-art approaches for knowledge integration, mostly using Wikidata as KB, on two popular fake news datasets - LIAR, a politics-based dataset, and COVID-19, a dataset of messages posted on social media relating to the COVID-19 pandemic. Our experiments show that knowledge-enhanced models can significantly improve fake news detection on LIAR where the KB is relevant and up-to-date. The mixed results on COVID-19 highlight the reliance on stylistic features and the importance of domain-specific and current KBs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题