SVLDL：使用选择性差异标签分布学习的改进的说话者年龄估计

论文标题

SVLDL：使用选择性差异标签分布学习的改进的说话者年龄估计

SVLDL: Improved Speaker Age Estimation Using Selective Variance Label Distribution Learning

论文作者

Kang, Zuheng, Wang, Jianzong, Peng, Junqing, Xiao, Jing

论文摘要

从单个演讲中估算年龄是一个经典且具有挑战性的话题。尽管标签分布学习（LDL）可以很好地代表相邻的不可分割的年龄，但是每种话语的年龄估计的不确定性因人而异，即年龄分布的差异不同。为了解决此问题，我们提出了选择性差异标签分布学习（SVLDL）方法，以适应不同年龄分布的方差。此外，该模型将WAVLM用作语音提取器，并添加了性别识别的辅助任务以进一步提高性能。对损失函数进行了两个技巧，以增强年龄估计的鲁棒性并提高拟合年龄分布的质量。广泛的实验表明，该模型在NIST SRE08-10和现实世界数据集的所有方面都达到了最新性能。

Estimating age from a single speech is a classic and challenging topic. Although Label Distribution Learning (LDL) can represent adjacent indistinguishable ages well, the uncertainty of the age estimate for each utterance varies from person to person, i.e., the variance of the age distribution is different. To address this issue, we propose selective variance label distribution learning (SVLDL) method to adapt the variance of different age distributions. Furthermore, the model uses WavLM as the speech feature extractor and adds the auxiliary task of gender recognition to further improve the performance. Two tricks are applied on the loss function to enhance the robustness of the age estimation and improve the quality of the fitted age distribution. Extensive experiments show that the model achieves state-of-the-art performance on all aspects of the NIST SRE08-10 and a real-world datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题