无监督的多模式表示学习，用于使用多孔可穿戴数据的情感计算

论文标题

无监督的多模式表示学习，用于使用多孔可穿戴数据的情感计算

Unsupervised Multi-Modal Representation Learning for Affective Computing with Multi-Corpus Wearable Data

论文作者

Ross, Kyle, Hungler, Paul, Etemad, Ali

论文摘要

随着智能技术的最新发展，人们越来越关注人工智能和机器学习来进行情感计算，以通过情感识别进一步增强用户体验。通常，使用用于情感计算的机器学习模型是使用生物信号中的手动提取功能训练的。此类功能可能无法很好地概括大型数据集，并且可以从原始输入数据中捕获信息时是最佳选择。解决此问题的一种方法是使用完全监督的深度学习方法来学习生物信号的潜在表示。但是，此方法需要人类的监督来标记数据，这些数据可能无法获得或难以获得。在这项工作中，我们提出一个无监督的框架减少了对人类监督的依赖。提出的框架利用两个堆叠的卷积自动编码器从可穿戴心电图（ECG）和电肌活动（EDA）信号中学习潜在表示。这些表示形式用于随机森林模型中进行二进制唤醒分类。这种方法减少了人类的监督，并使数据集的汇总允许更高的推广性。为了验证该框架，创建了由Amigos，确定，Cleas和Mahnob-HCI数据集组成的汇总数据集。将我们提出的方法的结果与使用卷积神经网络以及手动提取手工制作特征的方法进行了比较。还研究了用于融合这两种方式的方法。最后，我们表明我们的方法优于使用ECG和EDA生物信号在同一数据集上进行唤醒检测的当前最新结果。结果表明，用于与机器学习一起用于情感计算的堆叠卷积自动编码器的广泛适用性。

With recent developments in smart technologies, there has been a growing focus on the use of artificial intelligence and machine learning for affective computing to further enhance the user experience through emotion recognition. Typically, machine learning models used for affective computing are trained using manually extracted features from biological signals. Such features may not generalize well for large datasets and may be sub-optimal in capturing the information from the raw input data. One approach to address this issue is to use fully supervised deep learning methods to learn latent representations of the biosignals. However, this method requires human supervision to label the data, which may be unavailable or difficult to obtain. In this work we propose an unsupervised framework reduce the reliance on human supervision. The proposed framework utilizes two stacked convolutional autoencoders to learn latent representations from wearable electrocardiogram (ECG) and electrodermal activity (EDA) signals. These representations are utilized within a random forest model for binary arousal classification. This approach reduces human supervision and enables the aggregation of datasets allowing for higher generalizability. To validate this framework, an aggregated dataset comprised of the AMIGOS, ASCERTAIN, CLEAS, and MAHNOB-HCI datasets is created. The results of our proposed method are compared with using convolutional neural networks, as well as methods that employ manual extraction of hand-crafted features. The methodology used for fusing the two modalities is also investigated. Lastly, we show that our method outperforms current state-of-the-art results that have performed arousal detection on the same datasets using ECG and EDA biosignals. The results show the wide-spread applicability for stacked convolutional autoencoders to be used with machine learning for affective computing.

下载PDF全文

下载文献需遵守相关版权规定

论文标题