在工作室外面签名：连续手语识别的基准测试背景鲁棒性

论文标题

在工作室外面签名：连续手语识别的基准测试背景鲁棒性

Signing Outside the Studio: Benchmarking Background Robustness for Continuous Sign Language Recognition

论文作者

Jang, Youngjoon, Oh, Youngtaek, Cho, Jae Won, Kim, Dong-Jin, Chung, Joon Son, Kweon, In So

论文摘要

这项工作的目的是背景持续的手语识别。大多数现有的连续语言识别（CSLR）基准测试具有固定背景，并在具有静态单色背景的工作室中拍摄。但是，签名不仅限于现实世界中的工作室。为了分析背景变化下CSLR模型的鲁棒性，我们首先评估了不同背景上现有的最新CSLR模型。为了通过各种背景综合标志视频，我们建议使用现有CSLR基准测试的管道自动生成基准数据集。我们新建的基准数据集由模拟现实世界环境的各种场景组成。我们甚至观察到最新的CSLR方法也无法很好地识别我们的新数据集上具有变化背景的亮度。在这方面，我们还提出了一种简单而有效的训练计划，包括（1）背景随机性和（2）CSLR模型的特征分离。我们数据集的实验结果表明，我们的方法以最小的其他培训图像良好地概括到其他看不见的背景数据。

The goal of this work is background-robust continuous sign language recognition. Most existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed backgrounds and are filmed in studios with a static monochromatic background. However, signing is not limited only to studios in the real world. In order to analyze the robustness of CSLR models under background shifts, we first evaluate existing state-of-the-art CSLR models on diverse backgrounds. To synthesize the sign videos with a variety of backgrounds, we propose a pipeline to automatically generate a benchmark dataset utilizing existing CSLR benchmarks. Our newly constructed benchmark dataset consists of diverse scenes to simulate a real-world environment. We observe even the most recent CSLR method cannot recognize glosses well on our new dataset with changed backgrounds. In this regard, we also propose a simple yet effective training scheme including (1) background randomization and (2) feature disentanglement for CSLR models. The experimental results on our dataset demonstrate that our method generalizes well to other unseen background data with minimal additional training images.

下载PDF全文

下载文献需遵守相关版权规定

论文标题