论文标题
OpenFeat:通过开放式插件嵌入变压器来改善扬声器识别
openFEAT: Improving Speaker Identification by Open-set Few-shot Embedding Adaptation with Transformer
论文作者
论文摘要
很少有入学话语的家用扬声器识别是一个重要但具有挑战性的问题,尤其是当家庭成员具有相似的语音特征和房间声学时。从大量扬声器中学到的常见嵌入空间并不适用于家庭中每个说话者的最佳识别。在这项工作中,我们首先将家用扬声器识别作为几个开放式识别任务,然后提出一个新颖的嵌入适应框架,以使扬声器的表示从给定的通用嵌入空间适应使用设定的设定功能,从而使特定于家庭的嵌入空间适应了更好的家庭扬声器识别性能。使用我们的算法,与变压器(OpenFeat)的开放式嵌入适应性(OpenFeat),我们观察到,在具有2至7个难以歧视的扬声器的模拟家庭上,扬声器识别相等的错误率(IEER)的相对相对相对23%至31%。
Household speaker identification with few enrollment utterances is an important yet challenging problem, especially when household members share similar voice characteristics and room acoustics. A common embedding space learned from a large number of speakers is not universally applicable for the optimal identification of every speaker in a household. In this work, we first formulate household speaker identification as a few-shot open-set recognition task and then propose a novel embedding adaptation framework to adapt speaker representations from the given universal embedding space to a household-specific embedding space using a set-to-set function, yielding better household speaker identification performance. With our algorithm, Open-set Few-shot Embedding Adaptation with Transformer (openFEAT), we observe that the speaker identification equal error rate (IEER) on simulated households with 2 to 7 hard-to-discriminate speakers is reduced by 23% to 31% relative.