论文标题

对数线性守卫及其含义

Log-linear Guardedness and its Implications

论文作者

Ravfogel, Shauli, Goldberg, Yoav, Cotterell, Ryan

论文摘要

已经发现,从假定线性性的神经表示中擦除人解剖概念的方法已被发现是可行的和有用的。但是,尚未完全理解这种去除对经过修改的表示训练的下游分类器的行为的影响。在这项工作中,我们正式将对数线性护罩的概念定义为直接从表示形式预测概念并研究其含义的对手的能力。我们表明,在二进制情况下,在某些假设下,下游对数线性模型无法恢复擦除的概念。但是,我们证明了可以构造多类的log-linear模型\ emph {},在某些情况下间接恢复了该概念,指出对数线性守卫的固有局限性是下游偏置缓解技术。这些发现阐明了线性擦除方法的理论局限性,并强调了在神经模型中内在偏见和外在偏见之间的联系需要进一步研究。

Methods for erasing human-interpretable concepts from neural representations that assume linearity have been found to be tractable and useful. However, the impact of this removal on the behavior of downstream classifiers trained on the modified representations is not fully understood. In this work, we formally define the notion of log-linear guardedness as the inability of an adversary to predict the concept directly from the representation, and study its implications. We show that, in the binary case, under certain assumptions, a downstream log-linear model cannot recover the erased concept. However, we demonstrate that a multiclass log-linear model \emph{can} be constructed that indirectly recovers the concept in some cases, pointing to the inherent limitations of log-linear guardedness as a downstream bias mitigation technique. These findings shed light on the theoretical limitations of linear erasure methods and highlight the need for further research on the connections between intrinsic and extrinsic bias in neural models.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源