通过声乐爆发对情感识别的自我关系的关注和时间意识

论文标题

通过声乐爆发对情感识别的自我关系的关注和时间意识

Self-Relation Attention and Temporal Awareness for Emotion Recognition via Vocal Burst

论文作者

Trinh, Dang-Linh, Vo, Minh-Cong, Lee, Guee-Sang

论文摘要

该技术报告介绍了我们在ACII情感声音爆发（A-VB）2022研讨会\＆竞争中，我们对高维情感任务（A-VB高）的情感识别管道。我们提出的方法包含三个阶段。首先，我们通过自我监督的学习方法从原始音频信号及其MEL光谱图中提取潜在特征。然后，将原始信号的功能馈送到自相关的关注和时间意识（SA-TA）模块，以学习这些潜在特征之间的宝贵信息。最后，我们串联所有功能，并利用完全连接的图层来预测每个情绪的得分。通过经验实验，我们提出的方法在测试集上实现了平均一致性相关系数（CCC）为0.7295，而基线模型上的平均一致性相关系数（CCC）为0.5686。我们方法的代码可从https://github.com/linhtd812/a-vb2022获得。

The technical report presents our emotion recognition pipeline for high-dimensional emotion task (A-VB High) in The ACII Affective Vocal Bursts (A-VB) 2022 Workshop \& Competition. Our proposed method contains three stages. Firstly, we extract the latent features from the raw audio signal and its Mel-spectrogram by self-supervised learning methods. Then, the features from the raw signal are fed to the self-relation attention and temporal awareness (SA-TA) module for learning the valuable information between these latent features. Finally, we concatenate all the features and utilize a fully-connected layer to predict each emotion's score. By empirical experiments, our proposed method achieves a mean concordance correlation coefficient (CCC) of 0.7295 on the test set, compared to 0.5686 on the baseline model. The code of our method is available at https://github.com/linhtd812/A-VB2022.

下载PDF全文

下载文献需遵守相关版权规定

论文标题