通过增强生成对抗网络来自我监督的人类活动识别

论文标题

通过增强生成对抗网络来自我监督的人类活动识别

Self-Supervised Human Activity Recognition by Augmenting Generative Adversarial Networks

论文作者

Zadeh, Mohammad Zaki, Babu, Ashwin Ramesh, Jaiswal, Ashish, Makedon, Fillia

论文摘要

本文提出了一种新颖的方法，用于增强生成性对抗网络（GAN），以一项自我监管的任务，以提高其编码在人类活动识别等下游任务中有用的视频表示的能力。在提出的方法中，输入视频框架是通过不同的空间变换（例如旋转，翻译和剪切或时间变换）随机转换的。然后鼓励歧视者通过引入辅助损失来预测所应用的转换。随后，结果证明了所提出的方法优于基线方法，以提供在KTH，UCF101和Ball-Drop等数据集中使用的人类活动识别中使用的视频的有用表示。 Ball-Drop数据集是一个专门设计的数据集，用于通过身体和认知要求的任务来衡量儿童的执行功能。使用提出的方法而不是基线方法中的特征导致TOP-1分类精度增加4％。此外，进行了消融研究，以研究不同转化对下游任务的贡献。

This article proposes a novel approach for augmenting generative adversarial network (GAN) with a self-supervised task in order to improve its ability for encoding video representations that are useful in downstream tasks such as human activity recognition. In the proposed method, input video frames are randomly transformed by different spatial transformations, such as rotation, translation and shearing or temporal transformations such as shuffling temporal order of frames. Then discriminator is encouraged to predict the applied transformation by introducing an auxiliary loss. Subsequently, results prove superiority of the proposed method over baseline methods for providing a useful representation of videos used in human activity recognition performed on datasets such as KTH, UCF101 and Ball-Drop. Ball-Drop dataset is a specifically designed dataset for measuring executive functions in children through physically and cognitively demanding tasks. Using features from proposed method instead of baseline methods caused the top-1 classification accuracy to increase by more then 4%. Moreover, ablation study was performed to investigate the contribution of different transformations on downstream task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题