论文标题

部分可观测时空混沌系统的无模型预测

Graph2Vid: Flow graph to Video Grounding for Weakly-supervised Multi-Step Localization

论文作者

Dvornik, Nikita, Hadji, Isma, Pham, Hai, Bhatt, Dhaivat, Martinez, Brais, Fazly, Afsaneh, Jepson, Allan D.

论文摘要

在这项工作中,我们考虑了教学视频中弱监督的多步骤本地化的问题。解决此问题的一种既定方法是依靠给定的步骤列表。但是,实际上,通过按照略有变化的订单遵循一组步骤,通常有不止一种方法可以成功执行过程。因此,为了在给定视频中成功本地化,最近的作品需要在培训和测试时间的人类注释者提供视频中的实际过程步骤顺序。相反,在这里,我们仅依靠与特定视频无关的通用程序文本。我们通过将指令列表转换为捕获步骤部分顺序的过程流程图来代表完成过程的各种方法。使用流程图减少了训练和测试时间注释要求。为此,我们将新的流程图问题介绍给视频接地。在此设置中,我们寻求与过程流程图和给定视频一致的最佳步骤订购。为了解决此问题,我们提出了一种新算法-Graph2VID,该算法会在视频中实际上排序,并同时将其定位。为了显示我们提出的公式的优势,我们使用过程流程图信息扩展了Crosstask数据集。我们的实验表明,Graph2VID既比基线更有效,并且可以得出强大的步骤定位结果,而无需步骤订单注释。

In this work, we consider the problem of weakly-supervised multi-step localization in instructional videos. An established approach to this problem is to rely on a given list of steps. However, in reality, there is often more than one way to execute a procedure successfully, by following the set of steps in slightly varying orders. Thus, for successful localization in a given video, recent works require the actual order of procedure steps in the video, to be provided by human annotators at both training and test times. Instead, here, we only rely on generic procedural text that is not tied to a specific video. We represent the various ways to complete the procedure by transforming the list of instructions into a procedure flow graph which captures the partial order of steps. Using the flow graphs reduces both training and test time annotation requirements. To this end, we introduce the new problem of flow graph to video grounding. In this setup, we seek the optimal step ordering consistent with the procedure flow graph and a given video. To solve this problem, we propose a new algorithm - Graph2Vid - that infers the actual ordering of steps in the video and simultaneously localizes them. To show the advantage of our proposed formulation, we extend the CrossTask dataset with procedure flow graph information. Our experiments show that Graph2Vid is both more efficient than the baselines and yields strong step localization results, without the need for step order annotation.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源