用于域的暹罗X-Vector重建适用于扬声器的识别

论文标题

用于域的暹罗X-Vector重建适用于扬声器的识别

Siamese x-vector reconstruction for domain adapted speaker recognition

论文作者

Rozenberg, Shai, Aronowitz, Hagai, Hoory, Ron

论文摘要

随着语音激活应用的兴起，对说话者识别的需求正在迅速增加。当适当的端到端训练不可行时，X-Vector是一种基于深神经网络（DNN）的嵌入方法，被视为最先进的方法。但是，当记录条件（噪声，样本率等）在X矢量训练数据和目标数据之间或注册和测试数据之间，准确性显着降低。我们介绍了用于域适应的暹罗X矢量重建（SVR）。我们使用瘦辅助暹罗DNN从质量较低的对应物中重建较高质量信号的嵌入。我们在几个不匹配的情况下评估了我们的方法，并在基准方面表现出显着改善。

With the rise of voice-activated applications, the need for speaker recognition is rapidly increasing. The x-vector, an embedding approach based on a deep neural network (DNN), is considered the state-of-the-art when proper end-to-end training is not feasible. However, the accuracy significantly decreases when recording conditions (noise, sample rate, etc.) are mismatched, either between the x-vector training data and the target data or between enrollment and test data. We introduce the Siamese x-vector Reconstruction (SVR) for domain adaptation. We reconstruct the embedding of a higher quality signal from a lower quality counterpart using a lean auxiliary Siamese DNN. We evaluate our method on several mismatch scenarios and demonstrate significant improvement over the baseline.

下载PDF全文

下载文献需遵守相关版权规定

论文标题