一种基于迭代的迭代迭代方法，用于检索Covid-19错误信息的主题

论文标题

一种基于迭代的迭代迭代方法，用于检索Covid-19错误信息的主题

A Weakly-Supervised Iterative Graph-Based Approach to Retrieve COVID-19 Misinformation Topics

论文作者

Wang, Harry, Guntuku, Sharath Chandra

论文摘要

COVID-19大流行伴随着一种“跨社交媒体的健康信息”的“流行病”。在动态变化的信息格局中检测错误信息是具有挑战性的；由于检查帖子的内容和来源所需的大量人力，确定相关的关键字和职位是艰巨的。我们旨在通过引入基于弱点的迭代图来检测与错误信息有关的关键字，主题和主题，旨在降低此过程的资源成本。我们的方法可以从几个种子文本中成功地检测出与一般错误信息相关的种子单词的特定主题。我们的方法利用基于BERT的Word Graph搜索（BWGS）算法，该算法建立在基于上下文的神经网络嵌入以检索与信息相关的帖子的基础上。我们利用潜在的Dirichlet分配（LDA）主题建模来从BWG返回的文本中获取与错误信息相关的主题。此外，我们提出了基于BERT的多向单词图搜索（BMDWGS）算法，该算法利用了更大的启动上下文信息来提取错误信息。除了对我们的方法进行定性分析外，我们的定量分析表明，与低数据资源设置中的常见基线相比，BWG和BMDWG有效地提取与错误信息相关的内容。提取此类内容对于发现普遍的误解和关注以及促进精确的公共卫生消息传递运动以改善健康行为是有用的。

The COVID-19 pandemic has been accompanied by an `infodemic' -- of accurate and inaccurate health information across social media. Detecting misinformation amidst dynamically changing information landscape is challenging; identifying relevant keywords and posts is arduous due to the large amount of human effort required to inspect the content and sources of posts. We aim to reduce the resource cost of this process by introducing a weakly-supervised iterative graph-based approach to detect keywords, topics, and themes related to misinformation, with a focus on COVID-19. Our approach can successfully detect specific topics from general misinformation-related seed words in a few seed texts. Our approach utilizes the BERT-based Word Graph Search (BWGS) algorithm that builds on context-based neural network embeddings for retrieving misinformation-related posts. We utilize Latent Dirichlet Allocation (LDA) topic modeling for obtaining misinformation-related themes from the texts returned by BWGS. Furthermore, we propose the BERT-based Multi-directional Word Graph Search (BMDWGS) algorithm that utilizes greater starting context information for misinformation extraction. In addition to a qualitative analysis of our approach, our quantitative analyses show that BWGS and BMDWGS are effective in extracting misinformation-related content compared to common baselines in low data resource settings. Extracting such content is useful for uncovering prevalent misconceptions and concerns and for facilitating precision public health messaging campaigns to improve health behaviors.

下载PDF全文

下载文献需遵守相关版权规定

论文标题