论文标题
你在哪一侧?阴谋理论社交媒体中的内部外出来者分类
Which side are you on? Insider-Outsider classification in conspiracy-theoretic social media
论文作者
论文摘要
社交媒体是威胁叙事和相关阴谋论的一个繁殖地。在这些中,外部小组威胁着一个内部组的完整性,导致了明确定义的群体身份的出现:内部人士 - 作者认同和局外人的代理人 - 威胁内部人士的代理人。推断这些小组的成员构成了一个具有挑战性的新NLP任务:(i)信息分布在许多结构不佳的职位上; (ii)威胁和威胁代理人是高度背景的,同一职位可能会将多个代理分配给任一个组成员资格; (iii)代理人的身份通常是隐式和传递的; (iv)用来暗示局外人状态的短语通常不会遵循共同的负面情绪模式。为了应对这些挑战,我们定义了一项新颖的内部外抛光分类任务。由于我们不知道任何适当的现有数据集或随之而来的模型,因此我们引入了标记的数据集(CT5K)并设计一个模型(NP2IO)来解决此任务。 NP2IO利用验证的语言建模来对内部人士和局外人进行分类。 NP2IO被证明是健壮的,概括了训练期间未见的名词短语,并超过了非平凡基线模型的性能,$ 20 \%$。
Social media is a breeding ground for threat narratives and related conspiracy theories. In these, an outside group threatens the integrity of an inside group, leading to the emergence of sharply defined group identities: Insiders -- agents with whom the authors identify and Outsiders -- agents who threaten the insiders. Inferring the members of these groups constitutes a challenging new NLP task: (i) Information is distributed over many poorly-constructed posts; (ii) Threats and threat agents are highly contextual, with the same post potentially having multiple agents assigned to membership in either group; (iii) An agent's identity is often implicit and transitive; and (iv) Phrases used to imply Outsider status often do not follow common negative sentiment patterns. To address these challenges, we define a novel Insider-Outsider classification task. Because we are not aware of any appropriate existing datasets or attendant models, we introduce a labeled dataset (CT5K) and design a model (NP2IO) to address this task. NP2IO leverages pretrained language modeling to classify Insiders and Outsiders. NP2IO is shown to be robust, generalizing to noun phrases not seen during training, and exceeding the performance of non-trivial baseline models by $20\%$.