论文标题
学习本地化 - 暹罗视觉跟踪的目标信心
Learning Localization-aware Target Confidence for Siamese Visual Tracking
论文作者
论文摘要
暹罗跟踪范式取得了巨大的成功,通过分类和回归提供了有效的外观歧视和尺寸估计。尽管这样的范式通常会独立地优化分类和回归,但导致任务未对准(准确的预测框没有很高的目标置信度得分)。在本文中,为了减轻这种未对准,我们提出了一种新颖的跟踪范式,称为Siamla。在此范式中,引入了一系列简单但有效的本地化感知组件,以生成本地化感知的目标置信度评分。具体而言,借助提出的本地化感知动态标签(LADL)损失和本地化标签平滑(LALS)策略,分类和回归之间的协作优化可以使分类得分能够意识到位置状态,而不仅仅是外观相似性。此外,我们提出了一个单独的本地化分支,以本地化感知特征聚合(LAFA)模块为中心,以产生位置质量得分,以进一步修改分类得分。因此,所得的目标置信得分对于位置状态更具歧视性,允许准确的预测框被预测为高分。广泛的实验进行了六个具有挑战性的基准,包括GOT-10K,Trackingnet,Lasot,TNL2K,OTB100和Dot2018。我们的Siamla就准确性和效率都取得了最先进的表现。此外,稳定性分析表明,我们的跟踪范式相对稳定,这意味着范式可能是实现现实世界应用的。
Siamese tracking paradigm has achieved great success, providing effective appearance discrimination and size estimation by the classification and regression. While such a paradigm typically optimizes the classification and regression independently, leading to task misalignment (accurate prediction boxes have no high target confidence scores). In this paper, to alleviate this misalignment, we propose a novel tracking paradigm, called SiamLA. Within this paradigm, a series of simple, yet effective localization-aware components are introduced, to generate localization-aware target confidence scores. Specifically, with the proposed localization-aware dynamic label (LADL) loss and localization-aware label smoothing (LALS) strategy, collaborative optimization between the classification and regression is achieved, enabling classification scores to be aware of location state, not just appearance similarity. Besides, we propose a separate localization branch, centered on a localization-aware feature aggregation (LAFA) module, to produce location quality scores to further modify the classification scores. Consequently, the resulting target confidence scores, are more discriminative for the location state, allowing accurate prediction boxes tend to be predicted as high scores. Extensive experiments are conducted on six challenging benchmarks, including GOT-10k, TrackingNet, LaSOT, TNL2K, OTB100 and VOT2018. Our SiamLA achieves state-of-the-art performance in terms of both accuracy and efficiency. Furthermore, a stability analysis reveals that our tracking paradigm is relatively stable, implying the paradigm is potential to real-world applications.