一个具有人口统计学，立场，文明和局部性的注释社交媒体语料库的框架

论文标题

一个具有人口统计学，立场，文明和局部性的注释社交媒体语料库的框架

A Framework for Generating Annotated Social Media Corpora with Demographics, Stance, Civility, and Topicality

论文作者

Mishra, Shubhanshu, Collier, Daniel

论文摘要

在本文中，我们介绍了一个框架，用于注释各种类别的社交媒体文本语料库。由于社交媒体数据是通过个人生成的，因此重要的是注释个人人口属性的文本，以实现对语料库的社会技术分析。此外，在分析大型数据集时，我们通常可以注释一小部分数据样本，然后使用此样本训练预测模型，以注释相关类别的完整数据。我们使用Facebook评论Corpora的案例研究对学生贷款讨论进行了注释，该讨论是针对性别，军事隶属关系，年龄组，政治倾向，种族，立场，局部，新自由主义的观点和评论的文明。我们发布了Facebook评论的三个数据集，以供进一步研究：https：//github.com/socialmediaie/studentdebtfbcomments

In this paper we introduce a framework for annotating a social media text corpora for various categories. Since, social media data is generated via individuals, it is important to annotate the text for the individuals demographic attributes to enable a socio-technical analysis of the corpora. Furthermore, when analyzing a large data-set we can often annotate a small sample of data and then train a prediction model using this sample to annotate the full data for the relevant categories. We use a case study of a Facebook comment corpora on student loan discussion which was annotated for gender, military affiliation, age-group, political leaning, race, stance, topicalilty, neoliberlistic views and civility of the comment. We release three datasets of Facebook comments for further research at: https://github.com/socialmediaie/StudentDebtFbComments

下载PDF全文

下载文献需遵守相关版权规定

论文标题