论文标题
通过遥远的网络监督对推文的半监督立场检测
Semi-supervised Stance Detection of Tweets Via Distant Network Supervision
论文作者
论文摘要
社交媒体文本中的检测和标记立场是由仇恨言论检测,民意测验预测,参与预测和一致宣传检测的强烈动机。当今最好的神经立场探测器需要大量的培训数据,鉴于社交媒体文本的快速变化景观以及用户所关注的问题,这很难策划。社交网络上的同质属性提供了强烈的粗粒用户级别姿态的强烈信号。但是,针对推文级别立场检测的半监督方法无法正确利用同质性。鉴于此,我们提出了一种新的半监督立场检测器Sands。沙子从很少有标签的推文开始。它构建了推文的多个深度特征视图。它还使用社交网络的遥远监督信号向组件学习者提供替代损失信号。我们准备了两个新的推文数据集,其中包括由87,000多名用户,其追随者兼容性图表发布的两个人口统计学(美国和印度)的236,000多个政治上有色的推文,以及由语言学家注释的8,000多个推文。 Sands在美国(印度)的数据集上达到了0.55(0.49)的宏F1分数,其表现优于17个基线(包括沙子的变体),尤其是对于少数族裔立场标签和嘈杂的文本。在沙子上进行了许多消融实验,使文本和网络传播的立场信号的动力学解散。
Detecting and labeling stance in social media text is strongly motivated by hate speech detection, poll prediction, engagement forecasting, and concerted propaganda detection. Today's best neural stance detectors need large volumes of training data, which is difficult to curate given the fast-changing landscape of social media text and issues on which users opine. Homophily properties over the social network provide strong signal of coarse-grained user-level stance. But semi-supervised approaches for tweet-level stance detection fail to properly leverage homophily. In light of this, We present SANDS, a new semi-supervised stance detector. SANDS starts from very few labeled tweets. It builds multiple deep feature views of tweets. It also uses a distant supervision signal from the social network to provide a surrogate loss signal to the component learners. We prepare two new tweet datasets comprising over 236,000 politically tinted tweets from two demographics (US and India) posted by over 87,000 users, their follower-followee graph, and over 8,000 tweets annotated by linguists. SANDS achieves a macro-F1 score of 0.55 (0.49) on US (India)-based datasets, outperforming 17 baselines (including variants of SANDS) substantially, particularly for minority stance labels and noisy text. Numerous ablation experiments on SANDS disentangle the dynamics of textual and network-propagated stance signals.