论文标题

以运输为导向的功能汇总用于嵌入学习的扬声器

Transport-Oriented Feature Aggregation for Speaker Embedding Learning

论文作者

Tian, Yusheng, Li, Jingyu, Lee, Tan

论文摘要

需要汇总将框架级特征汇总到扬声器建模的话语级表示中。鉴于基于统计的合并方法的成功,我们假设说话者特性在统计分布中很好地表示了预聚类层的输出,并建议使用面向运输的特征聚合来推导扬声器嵌入。汇总表示形式编码了基础特征分布的几何结构,预计将包含有价值的说话者特定的信息,这些信息可能不会由常用的统计指标(如均值和方差)表示。原始面向运输的特征聚合还扩展到加权框架版本,以结合注意机制。使用Voxceleb数据集进行扬声器验证的实验表明,统计数据池及其细心变体的改进。

Pooling is needed to aggregate frame-level features into utterance-level representations for speaker modeling. Given the success of statistics-based pooling methods, we hypothesize that speaker characteristics are well represented in the statistical distribution over the pre-aggregation layer's output, and propose to use transport-oriented feature aggregation for deriving speaker embeddings. The aggregated representation encodes the geometric structure of the underlying feature distribution, which is expected to contain valuable speaker-specific information that may not be represented by the commonly used statistical measures like mean and variance. The original transport-oriented feature aggregation is also extended to a weighted-frame version to incorporate the attention mechanism. Experiments on speaker verification with the Voxceleb dataset show improvement over statistics pooling and its attentive variant.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源