识别可靠文本分类的虚假相关性

论文标题

识别可靠文本分类的虚假相关性

Identifying Spurious Correlations for Robust Text Classification

论文作者

Wang, Zhao, Culotta, Aron

论文摘要

文本分类器的预测通常是由虚假的相关性驱动的 - 例如，“ Spielberg”一词与正面审查的电影相关，即使该术语本身并没有在语义上传达积极的情感。在本文中，我们提出了一种区分文本分类中虚假和真实相关性的方法。我们将其视为监督分类问题，使用从治疗效应估计器中得出的特征，将虚假相关性与“真正”相关性区分开。由于这些功能的通用性质及其小维度，我们发现该方法即使在有限的培训示例中也可以很好地运行，并且可以将单词分类器传输到新领域。在四个数据集（情感分类和毒性检测）上进行的实验表明，使用这种方法为特征选择提供信息，还会导致更健壮的分类，这是通过提高受假相关影响的样品的最差案例准确性来衡量的。

The predictions of text classifiers are often driven by spurious correlations -- e.g., the term `Spielberg' correlates with positively reviewed movies, even though the term itself does not semantically convey a positive sentiment. In this paper, we propose a method to distinguish spurious and genuine correlations in text classification. We treat this as a supervised classification problem, using features derived from treatment effect estimators to distinguish spurious correlations from "genuine" ones. Due to the generic nature of these features and their small dimensionality, we find that the approach works well even with limited training examples, and that it is possible to transport the word classifier to new domains. Experiments on four datasets (sentiment classification and toxicity detection) suggest that using this approach to inform feature selection also leads to more robust classification, as measured by improved worst-case accuracy on the samples affected by spurious correlations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题