神经建筑寻找语音情感识别

论文标题

神经建筑寻找语音情感识别

Neural Architecture Search for Speech Emotion Recognition

论文作者

Wu, Xixin, Hu, Shoukang, Wu, Zhiyong, Liu, Xunying, Meng, Helen

论文摘要

深度神经网络为语音情感识别（SER）带来了重大进步。但是，SER中的体系结构设计主要基于专业知识和经验（反复试验）评估，这是耗时和资源密集的。在本文中，我们建议应用神经体系结构搜索（NAS）技术来自动配置SER模型。为了加速候选架构优化，我们提出了一种统一的路径辍学策略，以鼓励所有候选架构操作得到同样优化。 Iemocap上两个不同神经结构的实验结果表明，NAS可以改善SER性能（54.89 \％至56.28 \％），同时保持模型参数尺寸。拟议的辍学策略还显示出比以前的方法优越性。

Deep neural networks have brought significant advancements to speech emotion recognition (SER). However, the architecture design in SER is mainly based on expert knowledge and empirical (trial-and-error) evaluations, which is time-consuming and resource intensive. In this paper, we propose to apply neural architecture search (NAS) techniques to automatically configure the SER models. To accelerate the candidate architecture optimization, we propose a uniform path dropout strategy to encourage all candidate architecture operations to be equally optimized. Experimental results of two different neural structures on IEMOCAP show that NAS can improve SER performance (54.89\% to 56.28\%) while maintaining model parameter sizes. The proposed dropout strategy also shows superiority over the previous approaches.

下载PDF全文

下载文献需遵守相关版权规定

论文标题