论文标题
在背景和前景中缓解和评估动作表示的静态偏差
Mitigating and Evaluating Static Bias of Action Representations in the Background and the Foreground
论文作者
论文摘要
在视频动作识别中,快捷方式静态特征会干扰运动特征的学习,从而导致分布不良(OOD)的概括。视频背景显然是静态偏见的来源,但是视频前景(例如演员的衣服)也可以提供静态偏见。在本文中,我们通过创建视频的静态和移动部分的信号来创建测试视频,从经验上验证前景静态偏差的存在。为了解决这个问题,我们提出了一种简单而有效的技术,即Stillmix,以学习强大的行动表示。具体而言,Stillmix使用2D参考网络识别引起偏见的视频帧,并将它们与训练视频混合在一起,即使我们无法在每个视频框架或列举偏见类型中明确提取偏见的来源,也可以作为有效的偏见抑制。最后,为了准确评估静态偏差,我们合成了两个新的基准测试,即在后台静态提示,而Scufo则在前景中构成了静态提示。通过广泛的实验,我们证明了Stillmix可以减轻两种类型的静态偏差,并改善了下游应用程序的视频表示。代码可从https://github.com/lihaoxin05/stillmix获得。
In video action recognition, shortcut static features can interfere with the learning of motion features, resulting in poor out-of-distribution (OOD) generalization. The video background is clearly a source of static bias, but the video foreground, such as the clothing of the actor, can also provide static bias. In this paper, we empirically verify the existence of foreground static bias by creating test videos with conflicting signals from the static and moving portions of the video. To tackle this issue, we propose a simple yet effective technique, StillMix, to learn robust action representations. Specifically, StillMix identifies bias-inducing video frames using a 2D reference network and mixes them with videos for training, serving as effective bias suppression even when we cannot explicitly extract the source of bias within each video frame or enumerate types of bias. Finally, to precisely evaluate static bias, we synthesize two new benchmarks, SCUBA for static cues in the background, and SCUFO for static cues in the foreground. With extensive experiments, we demonstrate that StillMix mitigates both types of static bias and improves video representations for downstream applications. Code is available at https://github.com/lihaoxin05/StillMix.