通过梯度空间群集推理异常抗衡器组

论文标题

通过梯度空间群集推理异常抗衡器组

Outlier-Robust Group Inference via Gradient Space Clustering

论文作者

Zeng, Yuchen, Greenewald, Kristjan, Lee, Kangwook, Solomon, Justin, Yurochkin, Mikhail

论文摘要

传统的机器学习模型着重于在整体培训分配上取得良好的表现，但它们通常在少数群体上表现不佳。现有方法可以改善最差的组性能，但是它们可以有几个局限性：（i）它们需要组注释，这些注释通常昂贵，有时甚至是不可行的，并且/或（ii）它们对异常值敏感。大多数相关作品无法同时解决这两个问题，因为它们专注于少数群体和异常值的相互冲突的观点。我们通过在模型参数梯度的空间中将数据聚集在存在异常值的情况下，解决了学习组注释的问题。我们表明，梯度空间中的数据具有更简单的结构，同时保留了有关少数群体和离群值的信息，使其适用于诸如DBSCAN之类的标准聚类方法。广泛的实验表明，我们的方法在群体识别和下游最差的群体表现方面都显着优于最先进的实验。

Traditional machine learning models focus on achieving good performance on the overall training distribution, but they often underperform on minority groups. Existing methods can improve the worst-group performance, but they can have several limitations: (i) they require group annotations, which are often expensive and sometimes infeasible to obtain, and/or (ii) they are sensitive to outliers. Most related works fail to solve these two issues simultaneously as they focus on conflicting perspectives of minority groups and outliers. We address the problem of learning group annotations in the presence of outliers by clustering the data in the space of gradients of the model parameters. We show that data in the gradient space has a simpler structure while preserving information about minority groups and outliers, making it suitable for standard clustering methods like DBSCAN. Extensive experiments demonstrate that our method significantly outperforms state-of-the-art both in terms of group identification and downstream worst-group performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题