模块化和按需偏置缓解属性子网

论文标题

模块化和按需偏置缓解属性子网

Modular and On-demand Bias Mitigation with Attribute-Removal Subnetworks

论文作者

Hauzenberger, Lukas, Masoudian, Shahed, Kumar, Deepak, Schedl, Markus, Rekabsaz, Navid

论文摘要

社会偏见反映在大型预训练的语言模型及其在下游任务上的微调版本中。常见的内部处理偏置缓解方法，例如对抗性训练和彼此信息删除，引入了其他优化标准，并更新模型以达到新的依据状态。但是，在实践中，最终用户和从业者可能更喜欢切换回原始模型，或者仅在受保护属性的特定子集上应用辩论。为了实现这一目标，我们提出了一种新型的模块化偏置缓解方法，该方法由独立的高度稀疏性偏见子网组成，其中每个偏见模块都可以在推理时在按需的核心模型中集成到推理时。我们的方法借鉴了\ emph {diff}修剪的概念，并提出了一种可适应各种表示的新型训练制度。我们对具有性别，种族和年龄为受保护的属性的三个分类任务进行实验。结果表明，我们的模块化方法在维持任务绩效的同时，与基线登录相比，改善（或至少保持偏差）的有效性。特别是在两个属性数据集上，我们采用单独学习的偏见子网的方法显示了对选择性偏差缓解的子网的有效利用。

Societal biases are reflected in large pre-trained language models and their fine-tuned versions on downstream tasks. Common in-processing bias mitigation approaches, such as adversarial training and mutual information removal, introduce additional optimization criteria, and update the model to reach a new debiased state. However, in practice, end-users and practitioners might prefer to switch back to the original model, or apply debiasing only on a specific subset of protected attributes. To enable this, we propose a novel modular bias mitigation approach, consisting of stand-alone highly sparse debiasing subnetworks, where each debiasing module can be integrated into the core model on-demand at inference time. Our approach draws from the concept of \emph{diff} pruning, and proposes a novel training regime adaptable to various representation disentanglement optimizations. We conduct experiments on three classification tasks with gender, race, and age as protected attributes. The results show that our modular approach, while maintaining task performance, improves (or at least remains on-par with) the effectiveness of bias mitigation in comparison with baseline finetuning. Particularly on a two-attribute dataset, our approach with separately learned debiasing subnetworks shows effective utilization of either or both the subnetworks for selective bias mitigation.

下载PDF全文

下载文献需遵守相关版权规定

论文标题