论文标题
小型企业分类按名称:解决性别和地理起源偏见
Small Business Classification By Name: Addressing Gender and Geographic Origin Biases
论文作者
论文摘要
小型企业分类是许多应用程序(包括客户细分)中的一项艰巨而重要的任务。小型企业名称的培训引入了性别和地理起源偏见。这项工作中开发了一种仅基于业务名称的66种业务类型之一的模型(TOP-1 F1得分= 60.2%)。探索了从该模型中消除偏见的两种方法:用占位符令牌代替给定的名称,并使用性别交换的示例增强培训数据。报告了这些方法的结果,并通过隐藏模型的名称来减少模型中的偏差。但是,减少偏差是以分类性能为代价(TOP-1 F1得分= 56.6%)的。与评估的数据集中的名称隐藏方法相比,用性别交换样本的增加培训数据的增强效果较低。
Small business classification is a difficult and important task within many applications, including customer segmentation. Training on small business names introduces gender and geographic origin biases. A model for predicting one of 66 business types based only upon the business name was developed in this work (top-1 f1-score = 60.2%). Two approaches to removing the bias from this model are explored: replacing given names with a placeholder token, and augmenting the training data with gender-swapped examples. The results for these approaches is reported, and the bias in the model was reduced by hiding given names from the model. However, bias reduction was accomplished at the expense of classification performance (top-1 f1-score = 56.6%). Augmentation of the training data with gender-swapping samples proved less effective at bias reduction than the name hiding approach on the evaluated dataset.