迭代与群集：迭代半监督动作识别

论文标题

迭代与群集：迭代半监督动作识别

Iterate & Cluster: Iterative Semi-Supervised Action Recognition

论文作者

Li, Jingyuan, Shlizerman, Eli

论文摘要

我们提出了一个新型系统，用于主动半监督的基于特征的动作识别。给定在移动过程中跟踪的特征的时间序列我们的系统将序列群群群插入动作。我们的系统基于所示的编码器折叠式无监督方法，这些方法通过自动回归任务通过自组织来执行其潜在表示。对这些方法进行了对人类行动识别基准测试和优于基于非优势的无监督方法的测试，并获得了与基于骨架的监督方法相当的准确性。但是，这种方法依赖于K-Nearest邻居（KNN）将序列与动作相关联，而没有注释数据的一般特征将对应于近似群集，可以进一步增强。我们的系统提出了一种迭代性半监督方法，以应对这一挑战并积极学习集群和行动的关联。该方法利用了无监督编码器描述器的潜在空间嵌入和聚类来指导每次迭代中要注释的序列的选择。每次迭代，选择旨在提高动作识别精度，同时选择少量的注释序列。假设只有我们的方法选择的注释，并且在实验室实验中记录的鼠标运动视频上，我们将测试基于人类骨架的动作识别基准的方法。我们表明，只有一小部分注释，我们的系统可以提高识别性能。该系统可以用作交互式注释工具，以指导各种对象和动作的“野外”视频的标签工作，以达到强大的识别。

We propose a novel system for active semi-supervised feature-based action recognition. Given time sequences of features tracked during movements our system clusters the sequences into actions. Our system is based on encoder-decoder unsupervised methods shown to perform clustering by self-organization of their latent representation through the auto-regression task. These methods were tested on human action recognition benchmarks and outperformed non-feature based unsupervised methods and achieved comparable accuracy to skeleton-based supervised methods. However, such methods rely on K-Nearest Neighbours (KNN) associating sequences to actions, and general features with no annotated data would correspond to approximate clusters which could be further enhanced. Our system proposes an iterative semi-supervised method to address this challenge and to actively learn the association of clusters and actions. The method utilizes latent space embedding and clustering of the unsupervised encoder-decoder to guide the selection of sequences to be annotated in each iteration. Each iteration, the selection aims to enhance action recognition accuracy while choosing a small number of sequences for annotation. We test the approach on human skeleton-based action recognition benchmarks assuming that only annotations chosen by our method are available and on mouse movements videos recorded in lab experiments. We show that our system can boost recognition performance with only a small percentage of annotations. The system can be used as an interactive annotation tool to guide labeling efforts for 'in the wild' videos of various objects and actions to reach robust recognition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题