社会团体，个人行动和小组活动的联合学习

论文标题

社会团体，个人行动和小组活动的联合学习

Joint Learning of Social Groups, Individuals Action and Sub-group Activities in Videos

论文作者

Ehsanpour, Mahsa, Abedin, Alireza, Saleh, Fatemeh, Shi, Javen, Reid, Ian, Rezatofighi, Hamid

论文摘要

从视频流理解的人类活动理解的最先进的解决方案将任务作为一个时空问题，需要在场景中联合定位，并随着时间的推移将其行动或小组活动的分类。谁正在与谁互动，例如并非所有队列中的每个人都在彼此互动，通常不会被预测。在某些情况下，最好将人们分为子群体，我们称之为社会群体，每个社会群体可能会从事不同的社交活动。在本文中，我们解决了通过社交互动同时对人们进行分组的问题，预测他们的个人行为和每个社会群体的社会活动，我们称之为社会任务。我们的主要贡献是：i）我们为社会任务提出了一个端到端的可训练框架； ii）我们提出的方法还为传统的小组活动识别任务的两个广泛采用的基准设置了最新结果（假设场景的个人形成一个组并预测场景的单个组活动标签）； iii）我们在现有的小组活动数据集上介绍了新的注释，并将其重新定位为社会任务。

The state-of-the art solutions for human activity understanding from a video stream formulate the task as a spatio-temporal problem which requires joint localization of all individuals in the scene and classification of their actions or group activity over time. Who is interacting with whom, e.g. not everyone in a queue is interacting with each other, is often not predicted. There are scenarios where people are best to be split into sub-groups, which we call social groups, and each social group may be engaged in a different social activity. In this paper, we solve the problem of simultaneously grouping people by their social interactions, predicting their individual actions and the social activity of each social group, which we call the social task. Our main contributions are: i) we propose an end-to-end trainable framework for the social task; ii) our proposed method also sets the state-of-the-art results on two widely adopted benchmarks for the traditional group activity recognition task (assuming individuals of the scene form a single group and predicting a single group activity label for the scene); iii) we introduce new annotations on an existing group activity dataset, re-purposing it for the social task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题