论文标题
语音到无限应用的轻型变压器
A light transformer for speech-to-intent applications
论文作者
论文摘要
口语理解(SLU)系统可以使生活更加令人愉快,更安全(例如在汽车中),或者可以提高身体挑战的用户的独立性。但是,由于语音的许多差异来源,训练有素的系统很难转移到其他条件(例如不同语言或语音受损的用户)中。一种补救措施是设计一个用户熟练的SLU系统,该系统可以从用户的演示中全面学习,这又要求该系统的模型仅在几个培训样本后迅速收敛。在本文中,我们通过使用简化的相对位置编码降低模型大小并提高效率的目标,提出了一个光变压器结构。 Light Transformer是现有用户教授的多任务SLU系统的替代语音编码器。具有挑战性语音条件的三个数据集的实验结果证明,我们的方法的表现优于存在的系统和其他最先进的模型,其中一半的模型大小和训练时间。
Spoken language understanding (SLU) systems can make life more agreeable, safer (e.g. in a car) or can increase the independence of physically challenged users. However, due to the many sources of variation in speech, a well-trained system is hard to transfer to other conditions like a different language or to speech impaired users. A remedy is to design a user-taught SLU system that can learn fully from scratch from users' demonstrations, which in turn requires that the system's model quickly converges after only a few training samples. In this paper, we propose a light transformer structure by using a simplified relative position encoding with the goal to reduce the model size and improve efficiency. The light transformer works as an alternative speech encoder for an existing user-taught multitask SLU system. Experimental results on three datasets with challenging speech conditions prove our approach outperforms the existed system and other state-of-art models with half of the original model size and training time.