MVHOTA：一个多视图高阶跟踪准确度度量，以测量多点检测中的时空关联

论文标题

MVHOTA：一个多视图高阶跟踪准确度度量，以测量多点检测中的时空关联

mvHOTA: A multi-view higher order tracking accuracy metric to measure spatial and temporal associations in multi-point detection

论文作者

Sharan, Lalith, Kelm, Halvar, Romano, Gabriele, Karck, Matthias, De Simone, Raffaele, Engelhardt, Sandy

论文摘要

多点跟踪是一项具有挑战性的任务，涉及检测场景中的点并通过一系列帧跟踪它们。基于计算检测的措施（例如逐帧基础上的F量化）不足以评估整体性能，因为它不能解释时间域中的性能。可用的主要评估度量来自多对象跟踪（MOT）方法，可在Kitti等数据集上进行基准性能，并具有最近提议的高阶跟踪准确性（HOTA）度量，该指标能够更好地描述MOTA，DETA和IDF1等指标的性能。尽管HOTA指标考虑了时间关联，但它没有提供量身定制的手段来分析多相机设置中数据集的空间关联。此外，与对象相比，评估点的检测任务存在差异（点距离与边界框重叠）。因此，在这项工作中，我们提出了一个多视图高阶跟踪指标（MVHOTA），以确定多点（多企业和多级）跟踪方法的准确性，同时考虑到时间和空间关联。mvhota可以解释为检测的几何平均值。我们证明了该指标的使用来评估先前组织的手术数据科学挑战中内窥镜点检测数据集上的跟踪性能。此外，我们与此用例的其他调整后的MOT指标进行比较，讨论MVHOTA的属性，并展示拟议的多视图关联和遮挡指数（OI）如何促进对闭塞方法进行方法的分析。该代码可在https://github.com/cardio-ai/mvhota上找到。

Multi-point tracking is a challenging task that involves detecting points in the scene and tracking them across a sequence of frames. Computing detection-based measures like the F-measure on a frame-by-frame basis is not sufficient to assess the overall performance, as it does not interpret performance in the temporal domain. The main evaluation metric available comes from Multi-object tracking (MOT) methods to benchmark performance on datasets such as KITTI with the recently proposed higher order tracking accuracy (HOTA) metric, which is capable of providing a better description of the performance over metrics such as MOTA, DetA, and IDF1. While the HOTA metric takes into account temporal associations, it does not provide a tailored means to analyse the spatial associations of a dataset in a multi-camera setup. Moreover, there are differences in evaluating the detection task for points when compared to objects (point distances vs. bounding box overlap). Therefore in this work, we propose a multi-view higher order tracking metric (mvHOTA) to determine the accuracy of multi-point (multi-instance and multi-class) tracking methods, while taking into account temporal and spatial associations.mvHOTA can be interpreted as the geometric mean of detection, temporal, and spatial associations, thereby providing equal weighting to each of the factors. We demonstrate the use of this metric to evaluate the tracking performance on an endoscopic point detection dataset from a previously organised surgical data science challenge. Furthermore, we compare with other adjusted MOT metrics for this use-case, discuss the properties of mvHOTA, and show how the proposed multi-view Association and the Occlusion index (OI) facilitate analysis of methods with respect to handling of occlusions. The code is available at https://github.com/Cardio-AI/mvhota.

下载PDF全文

下载文献需遵守相关版权规定

论文标题