论文标题

语言辅助学习自闭症行为识别

Language-Assisted Deep Learning for Autistic Behaviors Recognition

论文作者

Deng, Andong, Yang, Taojiannan, Chen, Chen, Chen, Qian, Neely, Leslie, Oyama, Sakiko

论文摘要

正确识别自闭症谱系障碍儿童(ASD)的行为对于诊断自闭症和及时的早期干预至关重要。但是,自闭症儿童父母在治疗期间的观察和记录可能不是准确和客观的。在这种情况下,基于计算机视觉和机器学习(尤其是深度学习)技术的自动识别系统可以在很大程度上减轻此问题。现有的人类行动识别模型现在可以在具有挑战性的活动数据集上实现有说服力的表现,例如日常活动和运动活动。但是,ASD儿童的问题行为与这些一般活动大不相同,并且通过计算机视觉识别这些问题行为的研究较少。在本文中,我们首先在两个自闭症行为数据集(SSBD和ESBD)上评估了强大的动作识别基线,即视频Swin Transformer,并表明它可以实现高精度,并以较大的边距优于先前的方法,证明了基于视力的问题行为的可行性。此外,我们提出了语言辅助培训,以进一步提高行动识别表现。具体而言,我们通过为每种类型的问题行为纳入“免费可用”语言描述来开发两个分支多模式的深度学习框架。实验结果表明,与仅使用视频信息相比,合并其他语言监督可以为自闭症问题行为识别任务带来明显的绩效提升(即ESBD的3.49%提高,SSBD提高了1.46%)。

Correctly recognizing the behaviors of children with Autism Spectrum Disorder (ASD) is of vital importance for the diagnosis of Autism and timely early intervention. However, the observation and recording during the treatment from the parents of autistic children may not be accurate and objective. In such cases, automatic recognition systems based on computer vision and machine learning (in particular deep learning) technology can alleviate this issue to a large extent. Existing human action recognition models can now achieve persuasive performance on challenging activity datasets, e.g. daily activity, and sports activity. However, problem behaviors in children with ASD are very different from these general activities, and recognizing these problem behaviors via computer vision is less studied. In this paper, we first evaluate a strong baseline for action recognition, i.e. Video Swin Transformer, on two autism behaviors datasets (SSBD and ESBD) and show that it can achieve high accuracy and outperform the previous methods by a large margin, demonstrating the feasibility of vision-based problem behaviors recognition. Moreover, we propose language-assisted training to further enhance the action recognition performance. Specifically, we develop a two-branch multimodal deep learning framework by incorporating the "freely available" language description for each type of problem behavior. Experimental results demonstrate that incorporating additional language supervision can bring an obvious performance boost for the autism problem behaviors recognition task as compared to using the video information only (i.e. 3.49% improvement on ESBD and 1.46% on SSBD).

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源