检测使用BERT模型的Covid-19疫苗的抗Vaccine Tweet的时间间隔增加

论文标题

检测使用BERT模型的Covid-19疫苗的抗Vaccine Tweet的时间间隔增加

Detection of Increased Time Intervals of Anti-Vaccine Tweets for COVID-19 Vaccine with BERT Model

论文作者

Küçüktaş, Ülkü Tuncer, Uysal, Fatih, Hardalaç, Fırat, Biri, İsmail

论文摘要

针对COVID-19的解决方案中最有效的是开发的各种疫苗。不信任疫苗会阻碍这种疗法的快速有效使用。表达社会思想的手段之一是社交媒体。确定社交媒体中反疫苗接种增加的时间间隔可以帮助机构确定用于打击反疫苗接种的策略。录制和跟踪人工劳动的每条推文的效率都将降低，因此需要各种自动化解决方案。在这项研究中，使用了Transformers（BERT）模型的双向编码器表示，该模型是一种基于深度学习的自然语言处理（NLP）模型。在1506个推文的数据集中，分为四个不同类别，包括新闻，无关，抗疫苗和疫苗支持者，该模型接受了25个时期的学习率5e-6的培训。为了确定抗疫苗推文集中的间隔，使用训练有素的模型确定了652840推文所属的类别。确定类别的加班的变化被可视化，并确定可能导致更改的事件。由于模型训练，在测试数据集中，不同类别的F-评分分别为0.99,0.91，0.92，0.92。在该模型中，与文献研究不同，设计了一个辅助系统，可以通过在时间间隔内测量和可视化抗Vaccincine推文的频率来提供机构在确定策略时可以使用的数据，这与检测和审查此类推文不同。

The most effective of the solutions against Covid-19 is the various vaccines developed. Distrust of vaccines can hinder the rapid and effective use of this remedy. One of the means of expressing the thoughts of society is social media. Determining the time intervals during which anti-vaccination increases in social media can help institutions determine the strategy to be used in combating anti-vaccination. Recording and tracking every tweet entered with human labor would be inefficient, so various automation solutions are needed. In this study, The Bidirectional Encoder Representations from Transformers (BERT) model, which is a deep learning-based natural language processing (NLP) model, was used. In a dataset of 1506 tweets divided into four different categories as news, irrelevant, anti-vaccine, and vaccine supporters, the model was trained with a learning rate of 5e-6 for 25 epochs. To determine the intervals in which anti-vaccine tweets are concentrated, the categories to which 652840 tweets belong were determined by using the trained model. The change of the determined categories overtime was visualized and the events that could cause the change were determined. As a result of model training, in the test dataset, the f-score of 0.81 and AUC values for different classes were obtained as 0.99,0.91, 0.92, 0.92, respectively. In this model, unlike the studies in the literature, an auxiliary system is designed that provides data that institutions can use when determining their strategy by measuring and visualizing the frequency of anti-vaccine tweets in a time interval, different from detecting and censoring such tweets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题