长篇语音中基于构象的流语言识别的细心时间池

论文标题

长篇语音中基于构象的流语言识别的细心时间池

Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech

论文作者

Wang, Quan, Yu, Yang, Pelecanos, Jason, Huang, Yiling, Moreno, Ignacio Lopez

论文摘要

在本文中，我们介绍了一种基于构象层的新型语言识别系统。我们提出了一个细心的时间池机制，以使模型可以通过复发形式以长形式的音频携带信息，从而可以以流方式进行推断。此外，我们研究了两种域适应方法，以允许在不重新验证新域的模型参数的情况下调整现有语言标识模型。我们对模型大小的不同限制下的不同模型拓扑进行了比较研究，并发现基于构象体的模型显着超过了基于LSTM的模型和基于变压器的模型。我们的实验还表明，细心的时间池和域的适应性提高了模型的准确性。

In this paper, we introduce a novel language identification system based on conformer layers. We propose an attentive temporal pooling mechanism to allow the model to carry information in long-form audio via a recurrent form, such that the inference can be performed in a streaming fashion. Additionally, we investigate two domain adaptation approaches to allow adapting an existing language identification model without retraining the model parameters for a new domain. We perform a comparative study of different model topologies under different constraints of model size, and find that conformer-based models significantly outperform LSTM and transformer based models. Our experiments also show that attentive temporal pooling and domain adaptation improve model accuracy.

下载PDF全文

下载文献需遵守相关版权规定

论文标题