论文标题
无监督的时空潜在特征聚类用于多对象跟踪和分段
Unsupervised Spatio-temporal Latent Feature Clustering for Multiple-object Tracking and Segmentation
论文作者
论文摘要
将一致的时间标识符分配给视频序列中的多个移动对象是一个具有挑战性的问题。解决该问题的解决方案将在多个对象跟踪和分割问题中立即产生影响。我们提出了一种将时间识别任务视为时空聚类问题的策略。我们提出了一种使用卷积和完全连接的自动编码器的无监督学习方法,我们称之为深层异质自动编码器,以从分割面罩和检测边界框中学习判别特征。我们从验证的实例分割网络中提取面具及其相应的边界框,并使用任务依赖性的不确定性权重共同训练自动编码器,以产生共同的潜在特征。然后,我们构建约束图,鼓励在满足一组已知时间条件的对象之间建立关联。然后将特征向量和约束图提供给KMeans聚类算法,以分离潜在空间中的相应数据点。我们使用具有挑战性的合成和现实世界多目标视频数据集评估了方法的性能。我们的结果表明,我们的技术表现优于几种最新方法。
Assigning consistent temporal identifiers to multiple moving objects in a video sequence is a challenging problem. A solution to that problem would have immediate ramifications in multiple object tracking and segmentation problems. We propose a strategy that treats the temporal identification task as a spatio-temporal clustering problem. We propose an unsupervised learning approach using a convolutional and fully connected autoencoder, which we call deep heterogeneous autoencoder, to learn discriminative features from segmentation masks and detection bounding boxes. We extract masks and their corresponding bounding boxes from a pretrained instance segmentation network and train the autoencoders jointly using task-dependent uncertainty weights to generate common latent features. We then construct constraints graphs that encourage associations among objects that satisfy a set of known temporal conditions. The feature vectors and the constraints graphs are then provided to the kmeans clustering algorithm to separate the corresponding data points in the latent space. We evaluate the performance of our method using challenging synthetic and real-world multiple-object video datasets. Our results show that our technique outperforms several state-of-the-art methods.