论文标题
一个具有人口统计学,立场,文明和局部性的注释社交媒体语料库的框架
A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality
论文作者
论文摘要
在本文中,我们介绍了一个框架,用于注释各种类别的社交媒体文本语料库。由于社交媒体数据是通过个人生成的,因此重要的是注释个人人口属性的文本,以实现对语料库的社会技术分析。此外,在分析大型数据集时,我们通常可以注释一小部分数据样本,然后使用此样本训练预测模型,以注释相关类别的完整数据。我们使用Facebook评论Corpora的案例研究对学生贷款讨论进行了注释,该讨论是针对性别,军事隶属关系,年龄组,政治倾向,种族,立场,局部,新自由主义的观点和评论的文明。我们发布了Facebook评论的三个数据集,以供进一步研究:https://github.com/socialmediaie/studentdebtfbcomments
In this paper we introduce a framework for annotating a social media text corpora for various categories. Since, social media data is generated via individuals, it is important to annotate the text for the individuals demographic attributes to enable a socio-technical analysis of the corpora. Furthermore, when analyzing a large data-set we can often annotate a small sample of data and then train a prediction model using this sample to annotate the full data for the relevant categories. We use a case study of a Facebook comment corpora on student loan discussion which was annotated for gender, military affiliation, age-group, political leaning, race, stance, topicalilty, neoliberlistic views and civility of the comment. We release three datasets of Facebook comments for further research at: https://github.com/socialmediaie/StudentDebtFbComments