论文标题
#BlackLivesMatter运动和反抗议活动的Twitter语料库:2013年至2021年
Twitter Corpus of the #BlackLivesMatter Movement And Counter Protests: 2013 to 2021
论文作者
论文摘要
黑人生活问题(BLM)是一项分散的社会运动,抗议对黑人个人和社区的暴力行为,重点是警察暴力。在2020年艾哈迈德·阿贝里(Ahmaud Arbery),布雷娜·泰勒(Breonna Taylor)和乔治·弗洛伊德(George Floyd)的杀害之后,该运动引起了人们的关注。#BlackLivesMatter社交媒体标签已经代表了基层运动,类似的标签与抗议BLM运动(例如#AllllivSmatter,and #alllliverivesmatter和#bluelivematter and #bluelivematter)相似。我们介绍了来自100多个国家 /地区的1,300万用户的6390万条推文的数据集,其中包含以下关键字之一:BlackLivesMatter,AlllivesMatter和BluelivesMatter。该数据集包含从2013年BLM运动开始到2021年的所有当前可用推文。我们总结了数据集并显示了使用BlackLivesMatter关键字和与反向运动相关的关键字的时间趋势。此外,对于每个关键字,我们创建并发布了一组潜在的Dirichlet分配(LDA)主题(即自动聚集了语义上共归因于单词的组),以帮助研究人员识别三个关键字的语言模式。
Black Lives Matter (BLM) is a decentralized social movement protesting violence against Black individuals and communities, with a focus on police brutality. The movement gained significant attention following the killings of Ahmaud Arbery, Breonna Taylor, and George Floyd in 2020. The #BlackLivesMatter social media hashtag has come to represent the grassroots movement, with similar hashtags counter protesting the BLM movement, such as #AllLivesMatter, and #BlueLivesMatter. We introduce a data set of 63.9 million tweets from 13.0 million users from over 100 countries which contain one of the following keywords: BlackLivesMatter, AllLivesMatter, and BlueLivesMatter. This data set contains all currently available tweets from the beginning of the BLM movement in 2013 to 2021. We summarize the data set and show temporal trends in use of both the BlackLivesMatter keyword and keywords associated with counter movements. Additionally, for each keyword, we create and release a set of Latent Dirichlet Allocation (LDA) topics (i.e., automatically clustered groups of semantically co-occuring words) to aid researchers in identifying linguistic patterns across the three keywords.