论文标题
部分可观测时空混沌系统的无模型预测
Revisiting the Importance of Amplifying Bias for Debiasing
论文作者
论文摘要
在图像分类中,“ Debiasing”旨在训练分类器,以使数据集偏差不易受到数据集偏差的影响,数据样本的外围属性与目标类别之间的强相关性。例如,即使数据集中的青蛙类主要由具有沼泽背景的青蛙图像组成(即,与偏置一致的样本)组成,但有偏见的分类器也应该能够在海滩上正确地对青蛙进行正确分类(即偏见冲突的样品)。最近的辩论方法通常使用两个组件进行偏见,一个有偏见的模型$ f_b $和一个模型$ f_d $。 $ f_b $接受了专注于偏见的样本(即,过度适合偏见),而$ f_d $主要通过专注于$ f_b $未能学习的样品,主要接受偏见的样本培训,导致$ f_d $ sish the DataSet偏见较小。虽然最先进的偏见技术旨在更好地培训$ f_d $,但我们专注于培训$ f_b $,这是迄今为止被忽视的组件。我们的经验分析表明,从$ f_b $的培训设置中删除偏见的样本对于提高$ f_d $的偏见性能很重要。这是由于以下事实:偏置冲突样本可作为嘈杂的样本来放大$ f_b $的偏见,因为这些样本不包括偏差属性。为此,我们提出了一种简单而有效的数据样本选择方法,该方法可以删除偏置冲突的样本,以构建一个偏置放大数据集用于培训$ f_b $。我们的数据示例选择方法可以直接应用于现有的基于重新加权的偏差方法,从而获得一致的性能提升,并在合成和现实世界数据集上实现最先进的性能。
In image classification, "debiasing" aims to train a classifier to be less susceptible to dataset bias, the strong correlation between peripheral attributes of data samples and a target class. For example, even if the frog class in the dataset mainly consists of frog images with a swamp background (i.e., bias-aligned samples), a debiased classifier should be able to correctly classify a frog at a beach (i.e., bias-conflicting samples). Recent debiasing approaches commonly use two components for debiasing, a biased model $f_B$ and a debiased model $f_D$. $f_B$ is trained to focus on bias-aligned samples (i.e., overfitted to the bias) while $f_D$ is mainly trained with bias-conflicting samples by concentrating on samples which $f_B$ fails to learn, leading $f_D$ to be less susceptible to the dataset bias. While the state-of-the-art debiasing techniques have aimed to better train $f_D$, we focus on training $f_B$, an overlooked component until now. Our empirical analysis reveals that removing the bias-conflicting samples from the training set for $f_B$ is important for improving the debiasing performance of $f_D$. This is due to the fact that the bias-conflicting samples work as noisy samples for amplifying the bias for $f_B$ since those samples do not include the bias attribute. To this end, we propose a simple yet effective data sample selection method which removes the bias-conflicting samples to construct a bias-amplified dataset for training $f_B$. Our data sample selection method can be directly applied to existing reweighting-based debiasing approaches, obtaining consistent performance boost and achieving the state-of-the-art performance on both synthetic and real-world datasets.