集成形式和含义：一种嵌入声嵌入的多任务学习模型

论文标题

集成形式和含义：一种嵌入声嵌入的多任务学习模型

Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

论文作者

Abdullah, Badr M., Möbius, Bernd, Klakow, Dietrich

论文摘要

声词嵌入（AWES）的模型学会将可变长度的口语段映射到固定差异矢量表示上，以便在嵌入式空间附近投影了同一单词的不同声学示例。除了他们的语音技术应用外，AWE模型还显示出可以预测各种听觉词汇处理任务的人类绩效。当前的敬畏模型基于神经网络，并以自下而上的方法进行了培训，该方法集成了声音提示，以构建给定声或符号监督信号的单词表示。因此，这些模型在学习过程中不会利用或捕获高级词汇知识。在本文中，我们提出了一个多任务学习模型，该模型将自上而下的词汇知识纳入了敬畏的训练程序中。我们的模型学习了声学输入和词汇表示之间的映射，该表示除了基于自下而上的表单监督外，还编码了高级信息，例如单词语义。我们尝试三种语言，并证明合并词汇知识可以改善嵌入空间的可区分性，并鼓励模型更好地分开词汇类别。

Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input and a lexical representation that encodes high-level information such as word semantics in addition to bottom-up form-based supervision. We experiment with three languages and demonstrate that incorporating lexical knowledge improves the embedding space discriminability and encourages the model to better separate lexical categories.

下载PDF全文

下载文献需遵守相关版权规定

论文标题