Speecheq：基于多尺度统一数据集和多任务学习的语音情感识别

论文标题

Speecheq：基于多尺度统一数据集和多任务学习的语音情感识别

SpeechEQ: Speech Emotion Recognition based on Multi-scale Unified Datasets and Multitask Learning

论文作者

Kang, Zuheng, Peng, Junqing, Wang, Jianzong, Xiao, Jing

论文摘要

语音情感认可（SER）有许多挑战，但是主要挑战之一是每个框架都没有统一的标准。在本文中，我们提出了SpeechEQ，这是一个基于多规则统一指标统一SER任务的框架。该指标可以通过多任务学习（MTL）培训，其中包括情感状态类别（EIS）和情感强度量表（EIS）的两个情感识别任务，以及两个音素识别和性别识别的辅助任务。对于此框架，我们构建了一个普通话SER数据集-Secemeqeq数据集（SEQD）。我们对普通话的公共CASIA和ESD数据集进行了实验，该实验表明我们的方法比基线方法的差距相对较大，分别提高了8.0％和6.5％的准确性。对Iemocap的其他实验，具有四个情绪类别（即生气，快乐，悲伤和中性）也表明，所提出的方法的加权准确性（WA）的最新方法为78.16％，未加权准确性（UA）为77.47％。

Speech emotion recognition (SER) has many challenges, but one of the main challenges is that each framework does not have a unified standard. In this paper, we propose SpeechEQ, a framework for unifying SER tasks based on a multi-scale unified metric. This metric can be trained by Multitask Learning (MTL), which includes two emotion recognition tasks of Emotion States Category (EIS) and Emotion Intensity Scale (EIS), and two auxiliary tasks of phoneme recognition and gender recognition. For this framework, we build a Mandarin SER dataset - SpeechEQ Dataset (SEQD). We conducted experiments on the public CASIA and ESD datasets in Mandarin, which exhibit that our method outperforms baseline methods by a relatively large margin, yielding 8.0% and 6.5% improvement in accuracy respectively. Additional experiments on IEMOCAP with four emotion categories (i.e., angry, happy, sad, and neutral) also show the proposed method achieves a state-of-the-art of both weighted accuracy (WA) of 78.16% and unweighted accuracy (UA) of 77.47%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题