论文标题

NLP辅助的贝叶斯时间序列分析,用于Twitter网络欺凌的流行率,在COVID-19大流行期间

An NLP-Assisted Bayesian Time Series Analysis for Prevalence of Twitter Cyberbullying During the COVID-19 Pandemic

论文作者

Perez, Christopher, Karmakar, Sayar

论文摘要

Covid-19带来了社会动态的许多变化。在家中待命和学校教学中的破坏会影响面对面和在线的欺凌行为,这两者都会导致受害者的负面结果。为了专门研究网络欺凌,从2019年初到2021年底收集了包含与虐待相关的关键字的100万条推文,并使用了Twitter API搜索端点。在Twitter语料库中预先训练的一种自然语言处理模型为这推文带来了令人反感和仇恨的概率。为了克服采样的局限性,还使用计数端点收集数据。从给定的每日样本标记为虐待的给定样本中的推文的分数乘以计数端点报告的数量。一旦组装了这些调整后的计数,贝叶斯自回归泊松模型就可以研究数据的平均趋势和滞后函数以及它们随时间变化的变化。结果表明,每周和每年的季节性在仇恨的言论中有很强的季节性,但多年来可能造成的差异可能归因于Covid-19。

COVID-19 has brought about many changes in social dynamics. Stay-at-home orders and disruptions in school teaching can influence bullying behavior in-person and online, both of which leading to negative outcomes in victims. To study cyberbullying specifically, 1 million tweets containing keywords associated with abuse were collected from the beginning of 2019 to the end of 2021 with the Twitter API search endpoint. A natural language processing model pre-trained on a Twitter corpus generated probabilities for the tweets being offensive and hateful. To overcome limitations of sampling, data was also collected using the count endpoint. The fraction of tweets from a given daily sample marked as abusive is multiplied to the number reported by the count endpoint. Once these adjusted counts are assembled, a Bayesian autoregressive Poisson model allows one to study the mean trend and lag functions of the data and how they vary over time. The results reveal strong weekly and yearly seasonality in hateful speech but with slight differences across years that may be attributed to COVID-19.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源