深层残留的本地特征学习语音情感识别

论文标题

深层残留的本地特征学习语音情感识别

Deep Residual Local Feature Learning for Speech Emotion Recognition

论文作者

Singkul, Sattaya, Chatchaisathaporn, Thakorn, Suntisrivaraporn, Boontawee, Woraratpanya, Kuntpong

论文摘要

语音情感识别（SER）正在成为当今全球业务中提高服务效率的关键作用，例如呼叫中心服务。最近的SER是基于深度学习方法。但是，深度学习的效率取决于层的数量，即更深的层，效率较高。另一方面，较深的层是导致消失的梯度问题，低学习率和高时间的原因。因此，本文提出了现有本地特征学习块（LFLB）的重新设计。新设计称为深层残留的本地功能学习块（DeepReslflB）。 DeepReslflb由三个级联块组成：LFLB，剩余的本地特征学习块（RESLFLB）和多层感知器（MLP）。 LFLB用于学习局部相关性以及提取层次相关性； DeepReslflb可以使用残留学习来求解消失的梯度并减少过度拟合的剩余学习，以反复学习在更深层次的层中解释更多细节；采用MLP来找到学习的关系并发现预测的言语情绪和性别类型的概率。基于两个可用已发布的数据集：EMODB和RAVDESS，在通过标准指标评估时，提出的DeepReslflB可以显着提高性能：准确性，精度，召回和F1得分。

Speech Emotion Recognition (SER) is becoming a key role in global business today to improve service efficiency, like call center services. Recent SERs were based on a deep learning approach. However, the efficiency of deep learning depends on the number of layers, i.e., the deeper layers, the higher efficiency. On the other hand, the deeper layers are causes of a vanishing gradient problem, a low learning rate, and high time-consuming. Therefore, this paper proposed a redesign of existing local feature learning block (LFLB). The new design is called a deep residual local feature learning block (DeepResLFLB). DeepResLFLB consists of three cascade blocks: LFLB, residual local feature learning block (ResLFLB), and multilayer perceptron (MLP). LFLB is built for learning local correlations along with extracting hierarchical correlations; DeepResLFLB can take advantage of repeatedly learning to explain more detail in deeper layers using residual learning for solving vanishing gradient and reducing overfitting; and MLP is adopted to find the relationship of learning and discover probability for predicted speech emotions and gender types. Based on two available published datasets: EMODB and RAVDESS, the proposed DeepResLFLB can significantly improve performance when evaluated by standard metrics: accuracy, precision, recall, and F1-score.

下载PDF全文

下载文献需遵守相关版权规定

论文标题