论文标题

自然语言中的所有虚假特征都一样吗?通过因果镜头分析

Are All Spurious Features in Natural Language Alike? An Analysis through a Causal Lens

论文作者

Joshi, Nitish, Pan, Xiang, He, He

论文摘要

NLP中已使用“虚假相关性”一词来非正式地表示任何不良特征标签的相关性。但是,相关性可能是不希望的,因为(i)该特征与标签无关(例如,综述中的标点符号),或(ii)该功能对标签的影响取决于上下文(例如,综述中的否定词),这在语言任务中无处不在。在(i)的情况下,我们希望该模型是该功能不变的,而该功能既不必需,也不足以预测。但是在(ii)的情况下,即使是理想的模型(例如人类)也必须依靠该功能,因为有必要(但不足)进行预测。因此,需要对虚假特征进行更细粒度的处理来指定所需的模型行为。我们使用因果模型和必要性和充分性的概率对这种区别进行形式化,从而描绘了特征和标签之间的因果关系。然后,我们表明这种区别有助于解释有关不同伪造特征的现有辩论方法的结果,并揭示了令人惊讶的结果,例如在依据后,模型表示中的虚假特征编码。

The term `spurious correlations' has been used in NLP to informally denote any undesirable feature-label correlations. However, a correlation can be undesirable because (i) the feature is irrelevant to the label (e.g. punctuation in a review), or (ii) the feature's effect on the label depends on the context (e.g. negation words in a review), which is ubiquitous in language tasks. In case (i), we want the model to be invariant to the feature, which is neither necessary nor sufficient for prediction. But in case (ii), even an ideal model (e.g. humans) must rely on the feature, since it is necessary (but not sufficient) for prediction. Therefore, a more fine-grained treatment of spurious features is needed to specify the desired model behavior. We formalize this distinction using a causal model and probabilities of necessity and sufficiency, which delineates the causal relations between a feature and a label. We then show that this distinction helps explain results of existing debiasing methods on different spurious features, and demystifies surprising results such as the encoding of spurious features in model representations after debiasing.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源