以运输为导向的功能汇总用于嵌入学习的扬声器

论文标题

以运输为导向的功能汇总用于嵌入学习的扬声器

Transport-Oriented Feature Aggregation for Speaker Embedding Learning

论文作者

Tian, Yusheng, Li, Jingyu, Lee, Tan

论文摘要

需要汇总将框架级特征汇总到扬声器建模的话语级表示中。鉴于基于统计的合并方法的成功，我们假设说话者特性在统计分布中很好地表示了预聚类层的输出，并建议使用面向运输的特征聚合来推导扬声器嵌入。汇总表示形式编码了基础特征分布的几何结构，预计将包含有价值的说话者特定的信息，这些信息可能不会由常用的统计指标（如均值和方差）表示。原始面向运输的特征聚合还扩展到加权框架版本，以结合注意机制。使用Voxceleb数据集进行扬声器验证的实验表明，统计数据池及其细心变体的改进。

Pooling is needed to aggregate frame-level features into utterance-level representations for speaker modeling. Given the success of statistics-based pooling methods, we hypothesize that speaker characteristics are well represented in the statistical distribution over the pre-aggregation layer's output, and propose to use transport-oriented feature aggregation for deriving speaker embeddings. The aggregated representation encodes the geometric structure of the underlying feature distribution, which is expected to contain valuable speaker-specific information that may not be represented by the commonly used statistical measures like mean and variance. The original transport-oriented feature aggregation is also extended to a weighted-frame version to incorporate the attention mechanism. Experiments on speaker verification with the Voxceleb dataset show improvement over statistics pooling and its attentive variant.

下载PDF全文

下载文献需遵守相关版权规定

论文标题