通过半监督联合学习，保护隐私的语音情感识别

论文标题

通过半监督联合学习，保护隐私的语音情感识别

Privacy-preserving Speech Emotion Recognition through Semi-Supervised Federated Learning

论文作者

Tsouvalas, Vasileios, Ozcelebi, Tanir, Meratnia, Nirvana

论文摘要

语音情感识别（SER）是指自然言语对人类情感的认识。如果准确地完成，它可以在构建以人为中心感知的智能系统中提供许多好处。现有的SER方法在很大程度上是集中式的，而无需考虑用户的隐私。联合学习（FL）是一种分布式的机器学习范式，涉及对隐私敏感的个人数据的权力下放。在本文中，我们通过利用FL的概念提出了一种隐私和数据有效的SER方法。据我们所知，这是第一种联合SER方法，它利用自我训练的学习结合了联邦学习来利用标记和未标记的在设备上数据。我们对IEMOCAP数据集的实验评估表明，即使在数据标签的低可用性和高度非I.I.I.D的可用性下，我们的联合方法也可以学习可通用的SER模型。分布。我们表明，与全面监督联邦同行相比，平均而言，我们的方法平均可将识别率提高8.67％。

Speech Emotion Recognition (SER) refers to the recognition of human emotions from natural speech. If done accurately, it can offer a number of benefits in building human-centered context-aware intelligent systems. Existing SER approaches are largely centralized, without considering users' privacy. Federated Learning (FL) is a distributed machine learning paradigm dealing with decentralization of privacy-sensitive personal data. In this paper, we present a privacy-preserving and data-efficient SER approach by utilizing the concept of FL. To the best of our knowledge, this is the first federated SER approach, which utilizes self-training learning in conjunction with federated learning to exploit both labeled and unlabeled on-device data. Our experimental evaluations on the IEMOCAP dataset shows that our federated approach can learn generalizable SER models even under low availability of data labels and highly non-i.i.d. distributions. We show that our approach with as few as 10% labeled data, on average, can improve the recognition rate by 8.67% compared to the fully-supervised federated counterparts.

下载PDF全文

下载文献需遵守相关版权规定

论文标题