论文标题
使用必要的条件分析来识别辅助或对抗任务,以进行对抗多任务视频理解
Identifying Auxiliary or Adversarial Tasks Using Necessary Condition Analysis for Adversarial Multi-task Video Understanding
论文作者
论文摘要
近年来,对多任务学习以进行视频理解的兴趣越来越大。在这项工作中,我们提出了一个多任务学习的广义概念,通过结合两个辅助任务,该模型应在模型上表现良好,对对抗性任务不应表现良好。我们采用必要的状况分析(NCA)作为数据驱动的方法来确定这些任务应属于哪些类别。我们的新颖框架,对抗性多任务神经网络(AMT),对对抗性任务进行了惩罚,由NCA确定为NCA是整体视频理解(HVU)数据集中的场景识别,以提高动作识别。这颠覆了一个普遍的假设,即应始终鼓励模型在多任务学习中的所有任务上做得好。同时,AMT仍然保留多任务学习作为现有方法的概括的所有好处,并将对象识别作为辅助任务来帮助行动识别。我们介绍了HVU的两个具有挑战性的场景不变的测试分裂,其中对模型进行了对训练中未遇到的动作场合共发生的评估。我们表明,我们的方法将准确性提高了约3%,并鼓励模型参与动作功能,而不是相关的偏见场景功能。
There has been an increasing interest in multi-task learning for video understanding in recent years. In this work, we propose a generalized notion of multi-task learning by incorporating both auxiliary tasks that the model should perform well on and adversarial tasks that the model should not perform well on. We employ Necessary Condition Analysis (NCA) as a data-driven approach for deciding what category these tasks should fall in. Our novel proposed framework, Adversarial Multi-Task Neural Networks (AMT), penalizes adversarial tasks, determined by NCA to be scene recognition in the Holistic Video Understanding (HVU) dataset, to improve action recognition. This upends the common assumption that the model should always be encouraged to do well on all tasks in multi-task learning. Simultaneously, AMT still retains all the benefits of multi-task learning as a generalization of existing methods and uses object recognition as an auxiliary task to aid action recognition. We introduce two challenging Scene-Invariant test splits of HVU, where the model is evaluated on action-scene co-occurrences not encountered in training. We show that our approach improves accuracy by ~3% and encourages the model to attend to action features instead of correlation-biasing scene features.