论文标题
混合EVC:混合情绪综合和控制语音转换的控制
Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion
论文作者
论文摘要
传统上,情感语音转换(EVC)是针对语音从一种情绪状态转变为另一种情感状态的转变,先前的研究主要集中在离散的情绪类别上。本文通过引入一种新颖的观点来脱离规范:混合情绪的细微效果和增强对情感表达的控制。为了实现这一目标,我们提出了一个新型的EVC框架混合EVC,该框架仅利用离散的情绪训练标签。我们构建了一个属性向量,该向量编码这些离散情绪之间的关系,该属性使用基于排名的支持向量机进行预测,然后集成到序列到序列(SEQ2SEQ)EVC框架中。混合EVC不仅学会了表征输入情感风格,而且还量化了在训练过程中与其他情绪的相关性。结果,用户有能力分配这些属性以实现其所需的混合情绪渲染。客观和主观评估证实了我们方法在混合情绪综合和控制方面的有效性,同时超越了传统基线,以将离散情绪从一种转化为彼此。
Emotional voice conversion (EVC) traditionally targets the transformation of spoken utterances from one emotional state to another, with previous research mainly focusing on discrete emotion categories. This paper departs from the norm by introducing a novel perspective: a nuanced rendering of mixed emotions and enhancing control over emotional expression. To achieve this, we propose a novel EVC framework, Mixed-EVC, which only leverages discrete emotion training labels. We construct an attribute vector that encodes the relationships among these discrete emotions, which is predicted using a ranking-based support vector machine and then integrated into a sequence-to-sequence (seq2seq) EVC framework. Mixed-EVC not only learns to characterize the input emotional style but also quantifies its relevance to other emotions during training. As a result, users have the ability to assign these attributes to achieve their desired rendering of mixed emotions. Objective and subjective evaluations confirm the effectiveness of our approach in terms of mixed emotion synthesis and control while surpassing traditional baselines in the conversion of discrete emotions from one to another.