符合帧速率不可知的多对象跟踪

论文标题

符合帧速率不可知的多对象跟踪

Towards Frame Rate Agnostic Multi-Object Tracking

论文作者

Feng, Weitao, Bai, Lei, Yao, Yongqiang, Yu, Fengwei, Ouyang, Wanli

论文摘要

多对象跟踪（MOT）是有助于各种视频分析应用程序的最基本的计算机视觉任务之一。尽管最近取得了有希望的进展，但当前的MOT研究仍仅限于输入流的固定采样框架速率。实际上，我们从经验上发现，当输入帧速率变化时，所有最新最新跟踪器的准确性都会急剧下降。对于更智能的跟踪解决方案，我们将研究工作的注意力转移到了帧速率不可知的MOT（FRAMOT）的问题上，该问题将帧速率不敏感性考虑在内。在本文中，我们建议使用定期培训方案（FAPS）的帧速率不可知的MOT框架，以首次解决FRAMOT问题。具体而言，我们提出了一个帧速率不可知的关联模块（FAAM），该模块（FAAM）渗透并编码帧速率信息，以帮助跨多帧速率输入的身份匹配，从而提高了学习模型在处理FRAMOT中复杂的运动相关关系方面的能力。此外，FRAMOT在训练和推理之间的关联差距扩大，因为训练中未包含的那些后处理步骤在较低的帧速率方案中具有更大的影响。为了解决这个问题，我们建议定期培训计划（PTS）通过跟踪模式匹配和融合来反映培训中的所有后处理步骤。除了提出的方法外，我们首次尝试以两种不同的模式（即已知的帧速率和未知帧速率）建立这项新任务的评估方法，旨在处理更复杂的情况。关于挑战性MOT17/20数据集（FRAMOT版本）的定量实验清楚地表明，所提出的方法可以更好地处理不同的帧速率，从而提高针对复杂情况的鲁棒性。

Multi-Object Tracking (MOT) is one of the most fundamental computer vision tasks that contributes to various video analysis applications. Despite the recent promising progress, current MOT research is still limited to a fixed sampling frame rate of the input stream. In fact, we empirically found that the accuracy of all recent state-of-the-art trackers drops dramatically when the input frame rate changes. For a more intelligent tracking solution, we shift the attention of our research work to the problem of Frame Rate Agnostic MOT (FraMOT), which takes frame rate insensitivity into consideration. In this paper, we propose a Frame Rate Agnostic MOT framework with a Periodic training Scheme (FAPS) to tackle the FraMOT problem for the first time. Specifically, we propose a Frame Rate Agnostic Association Module (FAAM) that infers and encodes the frame rate information to aid identity matching across multi-frame-rate inputs, improving the capability of the learned model in handling complex motion-appearance relations in FraMOT. Moreover, the association gap between training and inference is enlarged in FraMOT because those post-processing steps not included in training make a larger difference in lower frame rate scenarios. To address it, we propose Periodic Training Scheme (PTS) to reflect all post-processing steps in training via tracking pattern matching and fusion. Along with the proposed approaches, we make the first attempt to establish an evaluation method for this new task of FraMOT in two different modes, i.e., known frame rate and unknown frame rate, aiming to handle a more complex situation. The quantitative experiments on the challenging MOT17/20 dataset (FraMOT version) have clearly demonstrated that the proposed approaches can handle different frame rates better and thus improve the robustness against complicated scenarios.

下载PDF全文

下载文献需遵守相关版权规定

论文标题