图形神经网络，具有连续学习以从社交媒体中进行虚假新闻检测

论文标题

图形神经网络，具有连续学习以从社交媒体中进行虚假新闻检测

Graph Neural Networks with Continual Learning for Fake News Detection from Social Media

论文作者

Han, Yi, Karunasekera, Shanika, Leckie, Christopher

论文摘要

尽管已大量努力用于事实检查，但对社交媒体的假新闻的普遍性对正义，公共信任和我们的社会产生了深远的影响仍然是一个严重的问题。在这项工作中，我们专注于基于繁殖的假新闻检测，因为最近的研究表明，虚假新闻和真实新闻在网上散布不同。具体而言，考虑到图形神经网络（GNN）在处理非欧盟数据数据方面的能力，我们使用GNN来区分社交媒体上假新闻和真实新闻的传播模式。特别是，我们专注于两个问题：（1）不依赖任何文本信息，例如推文内容，答复和用户描述，GNNS可以如何准确地识别假新闻？已知机器学习模型容易受到对抗性攻击的影响，并且避免对基于文本的功能的依赖可以使该模型不易受到高级假新闻制造商的操纵。（2）如何处理新的，看不见的数据？换句话说，在给定数据集上训练的GNN如何在新的且潜在的截然不同的数据集上执行？如果它达到了不令人满意的性能，我们如何解决问题而不将模型重新培训从头开始？我们在两个数据集上研究了上述问题，其中包含数千个标记的新闻项目，我们的结果表明：（1）GNN可以在没有任何文本信息与最先进方法的情况下实现可比或出色的性能。（2）在给定数据集中培训的GNN在新的，看不见的数据和直接的增量培训上的表现不佳，无法解决问题---在以前的工作中尚未解决此问题，该问题适用于GNNS进行假新闻检测。为了解决问题，我们提出了一种通过使用持续学习中的技术来逐步培训GNN的技术，可以在现有和新数据集上达到平衡性能。

Although significant effort has been applied to fact-checking, the prevalence of fake news over social media, which has profound impact on justice, public trust and our society, remains a serious problem. In this work, we focus on propagation-based fake news detection, as recent studies have demonstrated that fake news and real news spread differently online. Specifically, considering the capability of graph neural networks (GNNs) in dealing with non-Euclidean data, we use GNNs to differentiate between the propagation patterns of fake and real news on social media. In particular, we concentrate on two questions: (1) Without relying on any text information, e.g., tweet content, replies and user descriptions, how accurately can GNNs identify fake news? Machine learning models are known to be vulnerable to adversarial attacks, and avoiding the dependence on text-based features can make the model less susceptible to the manipulation of advanced fake news fabricators. (2) How to deal with new, unseen data? In other words, how does a GNN trained on a given dataset perform on a new and potentially vastly different dataset? If it achieves unsatisfactory performance, how do we solve the problem without re-training the model on the entire data from scratch? We study the above questions on two datasets with thousands of labelled news items, and our results show that: (1) GNNs can achieve comparable or superior performance without any text information to state-of-the-art methods. (2) GNNs trained on a given dataset may perform poorly on new, unseen data, and direct incremental training cannot solve the problem---this issue has not been addressed in the previous work that applies GNNs for fake news detection. In order to solve the problem, we propose a method that achieves balanced performance on both existing and new datasets, by using techniques from continual learning to train GNNs incrementally.

下载PDF全文

下载文献需遵守相关版权规定

论文标题