论文标题
VPIT:使用体素伪图像实时嵌入式单个对象3D跟踪
VPIT: Real-time Embedded Single Object 3D Tracking Using Voxel Pseudo Images
论文作者
论文摘要
在本文中,我们提出了一种新型的基于体素的3D单一对象跟踪(3D SOT)方法,称为Voxel Pseudo Image Tracking(VPIT)。 VPIT是第一种使用体素伪图像进行3D SOT的方法。输入点云是通过基于支柱的体素化构成的,所得的伪图像被用作2D样暹罗SOT方法的输入。伪图像是在Bird's-eye视图(BEV)坐标中创建的,因此中的对象具有恒定的大小。因此,只有对象旋转才能在新的坐标系中而不是对象刻度上发生变化。因此,我们用多旋转搜索替换多尺度搜索,在该搜索中,将不同的旋转搜索区域与单个目标表示形式进行比较,以预测对象的位置和旋转。 Kitti跟踪数据集的实验表明,VPIT是最快的3D SOT方法,并保持竞争成功和精确值。在现实世界中,应用SOT方法的应用与嵌入式设备的计算能力较低的限制和延迟 - 非福音环境相似,如果推断速度不够高,该方法被迫跳过某些数据帧。我们实施了一个实时评估协议,并表明其他方法在嵌入式设备上失去了大部分性能,而VPIT保持其跟踪对象的能力。
In this paper, we propose a novel voxel-based 3D single object tracking (3D SOT) method called Voxel Pseudo Image Tracking (VPIT). VPIT is the first method that uses voxel pseudo images for 3D SOT. The input point cloud is structured by pillar-based voxelization, and the resulting pseudo image is used as an input to a 2D-like Siamese SOT method. The pseudo image is created in the Bird's-eye View (BEV) coordinates, and therefore the objects in it have constant size. Thus, only the object rotation can change in the new coordinate system and not the object scale. For this reason, we replace multi-scale search with a multi-rotation search, where differently rotated search regions are compared against a single target representation to predict both position and rotation of the object. Experiments on KITTI Tracking dataset show that VPIT is the fastest 3D SOT method and maintains competitive Success and Precision values. Application of a SOT method in a real-world scenario meets with limitations such as lower computational capabilities of embedded devices and a latency-unforgiving environment, where the method is forced to skip certain data frames if the inference speed is not high enough. We implement a real-time evaluation protocol and show that other methods lose most of their performance on embedded devices, while VPIT maintains its ability to track the object.