在Twitter上进行跨语言转移学习，以进行值得支票的索赔标识

论文标题

在Twitter上进行跨语言转移学习，以进行值得支票的索赔标识

Cross-lingual Transfer Learning for Check-worthy Claim Identification over Twitter

论文作者

Hasanain, Maram, Elsayed, Tamer

论文摘要

错误的信息传播到社交媒体上已成为不可否认的不可测量。但是，并非所有传播索赔都相等。如果传播的话，一些主张不仅在个人层面上，而且对组织甚至国家都可能具有破坏性。检测应优先考虑事实核对的主张被认为是与假新闻传播的第一步。由于培训数据仅限于少数语言，因此开发了有监督的模型来解决问题，而不是资源较低的语言解决问题。因此，我们的工作旨在调查我们是否可以使用现有数据集来培训模型，以预测其他语言中推文中索赔的价值。我们借助多语言BERT（MBERT）模型介绍了六种跨越五种不同语言的跨语义校验值估计的六种方法的系统比较研究。我们使用最先进的多语言Twitter数据集运行实验。我们的结果表明，对于某些语言对，零拍的跨语义转移是可能的，并且可以与对目标语言训练的单语模型一样出色。我们还表明，在某些语言中，这种方法的表现优于最先进的模型（或至少与）。

Misinformation spread over social media has become an undeniable infodemic. However, not all spreading claims are made equal. If propagated, some claims can be destructive, not only on the individual level, but to organizations and even countries. Detecting claims that should be prioritized for fact-checking is considered the first step to fight against spread of fake news. With training data limited to a handful of languages, developing supervised models to tackle the problem over lower-resource languages is currently infeasible. Therefore, our work aims to investigate whether we can use existing datasets to train models for predicting worthiness of verification of claims in tweets in other languages. We present a systematic comparative study of six approaches for cross-lingual check-worthiness estimation across pairs of five diverse languages with the help of Multilingual BERT (mBERT) model. We run our experiments using a state-of-the-art multilingual Twitter dataset. Our results show that for some language pairs, zero-shot cross-lingual transfer is possible and can perform as good as monolingual models that are trained on the target language. We also show that in some languages, this approach outperforms (or at least is comparable to) state-of-the-art models.

下载PDF全文

下载文献需遵守相关版权规定

论文标题