Danhar：使用可穿戴传感器的多模式人类活动识别的双重注意网络

论文标题

Danhar：使用可穿戴传感器的多模式人类活动识别的双重注意网络

DanHAR: Dual Attention Network For Multimodal Human Activity Recognition Using Wearable Sensors

论文作者

Gao, Wenbin, Zhang, Lei, Teng, Qi, He, Jun, Wu, Hao

论文摘要

无处不在的计算中的人类活动识别（HAR）已开始将注意力纳入深神经网络（DNN）的背景下，其中使用多模式传感器（例如加速度计和陀螺仪）的丰富传感数据用于推断人类活动。最近，通过与封闭式复发单元（GRU）和长短期内存（LSTM）网络结合提出了两种注意方法，这些方法可以同时捕获空间和时间域中传感信号的依赖性。但是，与卷积神经网络（CNN）相比，经常性网络通常具有代表功率的弱特征。另一方面，通过与CNN结合，将两个注意力（即强调和软注意力）应用于时间领域，它们会更加关注长序列的目标活动。但是，他们只能分辨出在哪里重点，并错过渠道信息，这在决定重点是什么方面起着重要作用。结果，与基于注意的GRU或LSTM相比，它们无法解决多模式传感信号的空间依赖性。在本文中，我们提出了一种名为Danhar的新型双重注意方法，该方法介绍了在CNN上融合通道注意力和时间关注的框架，这表明了提高多模式HAR的可理解性方面的优势。在四个公共HAR数据集和弱标记数据集上进行的广泛实验表明，Danhar通过参数的开销可忽略不计。此外，还提供了可视化分析，以表明我们的注意力可以在分类过程中放大更重要的传感器方式和时间段，这与人类共同的直觉非常吻合。

Human activity recognition (HAR) in ubiquitous computing has been beginning to incorporate attention into the context of deep neural networks (DNNs), in which the rich sensing data from multimodal sensors such as accelerometer and gyroscope is used to infer human activities. Recently, two attention methods are proposed via combining with Gated Recurrent Units (GRU) and Long Short-Term Memory (LSTM) network, which can capture the dependencies of sensing signals in both spatial and temporal domains simultaneously. However, recurrent networks often have a weak feature representing power compared with convolutional neural networks (CNNs). On the other hand, two attention, i.e., hard attention and soft attention, are applied in temporal domains via combining with CNN, which pay more attention to the target activity from a long sequence. However, they can only tell where to focus and miss channel information, which plays an important role in deciding what to focus. As a result, they fail to address the spatial-temporal dependencies of multimodal sensing signals, compared with attention-based GRU or LSTM. In the paper, we propose a novel dual attention method called DanHAR, which introduces the framework of blending channel attention and temporal attention on a CNN, demonstrating superiority in improving the comprehensibility for multimodal HAR. Extensive experiments on four public HAR datasets and weakly labeled dataset show that DanHAR achieves state-of-the-art performance with negligible overhead of parameters. Furthermore, visualizing analysis is provided to show that our attention can amplifies more important sensor modalities and timesteps during classification, which agrees well with human common intuition.

下载PDF全文

下载文献需遵守相关版权规定

论文标题