爆发：视频中统一对象识别，细分和跟踪的基准

论文标题

爆发：视频中统一对象识别，细分和跟踪的基准

BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

论文作者

Athar, Ali, Luiten, Jonathon, Voigtlaender, Paul, Khurana, Tarasha, Dave, Achal, Leibe, Bastian, Ramanan, Deva

论文摘要

多个现有的基准测试涉及视频中的跟踪和分割对象，例如，视频对象细分（VOS）和多对象跟踪和分割（MOTS）（MOTS），但是由于使用不同基准测试数据集和指标（例如J＆F，J＆F，Map，Smotsa），它们之间几乎没有相互作用。结果，已发表的作品通常针对特定的基准，并且不容易彼此相提并论。我们认为，可以解决多个任务的广义方法的发展需要在这些研究子社区中更大的凝聚力。在本文中，我们旨在通过提出爆发来促进这一点，该数据集包含数千个带有高质量对象掩码的视频，以及一个相关的基准测试，其中包含六个任务，涉及视频中的对象跟踪和细分。使用相同的数据和可比较的指标对所有任务进行评估，这使研究人员可以一致考虑它们，因此更有效地从不同任务的不同方法中汇集了知识。此外，我们为所有任务展示了几个基线，并证明可以将一个任务的方法应用于另一个任务，并具有可量化且可解释的性能差异。数据集注释和评估代码可在以下网址提供：https：//github.com/ali2500/burst-benchmark。

Multiple existing benchmarks involve tracking and segmenting objects in video e.g., Video Object Segmentation (VOS) and Multi-Object Tracking and Segmentation (MOTS), but there is little interaction between them due to the use of disparate benchmark datasets and metrics (e.g. J&F, mAP, sMOTSA). As a result, published works usually target a particular benchmark, and are not easily comparable to each another. We believe that the development of generalized methods that can tackle multiple tasks requires greater cohesion among these research sub-communities. In this paper, we aim to facilitate this by proposing BURST, a dataset which contains thousands of diverse videos with high-quality object masks, and an associated benchmark with six tasks involving object tracking and segmentation in video. All tasks are evaluated using the same data and comparable metrics, which enables researchers to consider them in unison, and hence, more effectively pool knowledge from different methods across different tasks. Additionally, we demonstrate several baselines for all tasks and show that approaches for one task can be applied to another with a quantifiable and explainable performance difference. Dataset annotations and evaluation code is available at: https://github.com/Ali2500/BURST-benchmark.

下载PDF全文

下载文献需遵守相关版权规定

论文标题