论文标题
以运输为导向的功能汇总用于嵌入学习的扬声器
Transport-Oriented Feature Aggregation for Speaker Embedding Learning
论文作者
论文摘要
需要汇总将框架级特征汇总到扬声器建模的话语级表示中。鉴于基于统计的合并方法的成功,我们假设说话者特性在统计分布中很好地表示了预聚类层的输出,并建议使用面向运输的特征聚合来推导扬声器嵌入。汇总表示形式编码了基础特征分布的几何结构,预计将包含有价值的说话者特定的信息,这些信息可能不会由常用的统计指标(如均值和方差)表示。原始面向运输的特征聚合还扩展到加权框架版本,以结合注意机制。使用Voxceleb数据集进行扬声器验证的实验表明,统计数据池及其细心变体的改进。
Pooling is needed to aggregate frame-level features into utterance-level representations for speaker modeling. Given the success of statistics-based pooling methods, we hypothesize that speaker characteristics are well represented in the statistical distribution over the pre-aggregation layer's output, and propose to use transport-oriented feature aggregation for deriving speaker embeddings. The aggregated representation encodes the geometric structure of the underlying feature distribution, which is expected to contain valuable speaker-specific information that may not be represented by the commonly used statistical measures like mean and variance. The original transport-oriented feature aggregation is also extended to a weighted-frame version to incorporate the attention mechanism. Experiments on speaker verification with the Voxceleb dataset show improvement over statistics pooling and its attentive variant.