论文标题
野外视频的多模式情绪估计
Multi-modal Emotion Estimation for in-the-wild Videos
论文作者
论文摘要
在本文中,我们简要介绍了第三个情感行为分析(ABAW)竞争的价值估计挑战的提交。我们的方法利用了多模式信息,即视觉和音频信息,并采用时间编码器来对视频中的时间上下文进行建模。此外,还采用平滑处理器来获得更合理的预测,并使用模型合奏策略来改善我们提出的方法的性能。实验结果表明,在Aff-Wild2数据集的验证集中,我们的方法可达到价为65.55%的CCC,唤醒的CCC为70.88%CCC,这证明了我们提出的方法的有效性。
In this paper, we briefly introduce our submission to the Valence-Arousal Estimation Challenge of the 3rd Affective Behavior Analysis in-the-wild (ABAW) competition. Our method utilizes the multi-modal information, i.e., the visual and audio information, and employs a temporal encoder to model the temporal context in the videos. Besides, a smooth processor is applied to get more reasonable predictions, and a model ensemble strategy is used to improve the performance of our proposed method. The experiment results show that our method achieves 65.55% ccc for valence and 70.88% ccc for arousal on the validation set of the Aff-Wild2 dataset, which prove the effectiveness of our proposed method.