多模式域的适应性，以识别细粒度的动作

论文标题

多模式域的适应性，以识别细粒度的动作

Multi-Modal Domain Adaptation for Fine-Grained Action Recognition

论文作者

Munro, Jonathan, Damen, Dima

论文摘要

细粒度的动作识别数据集表现出环境偏见，其中多个视频序列是从有限数量的环境中捕获的。在一个环境中训练模型并在另一种环境中部署，导致由于不可避免的域移动而导致性能下降。无监督的域适应性（UDA）方法经常在源和目标域之间使用对抗训练。但是，这些方法尚未探讨每个域内视频的多模式性质。在这项工作中，除了对抗对准外，我们还利用了对UDA的自我监督的对应方式的对应关系。我们使用通常采用的两种用于动作识别的方式：RGB和光流，我们从大规模数据集Epic-Kitchens上测试了三个厨房的方法。我们表明，仅多模式的自我统治会平均将仅源培训的绩效提高2.4％。然后，我们将对抗性训练与多模式的自学训练相结合，这表明我们的方法的表现优于其他UDA方法的3％。

Fine-grained action recognition datasets exhibit environmental bias, where multiple video sequences are captured from a limited number of environments. Training a model in one environment and deploying in another results in a drop in performance due to an unavoidable domain shift. Unsupervised Domain Adaptation (UDA) approaches have frequently utilised adversarial training between the source and target domains. However, these approaches have not explored the multi-modal nature of video within each domain. In this work we exploit the correspondence of modalities as a self-supervised alignment approach for UDA in addition to adversarial alignment. We test our approach on three kitchens from our large-scale dataset, EPIC-Kitchens, using two modalities commonly employed for action recognition: RGB and Optical Flow. We show that multi-modal self-supervision alone improves the performance over source-only training by 2.4% on average. We then combine adversarial training with multi-modal self-supervision, showing that our approach outperforms other UDA methods by 3%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题