端到端口语理解中的语义复杂性

论文标题

端到端口语理解中的语义复杂性

Semantic Complexity in End-to-End Spoken Language Understanding

论文作者

McKenna, Joseph P., Choudhary, Samridhi, Saxon, Michael, Strimel, Grant P., Mouchtaris, Athanasios

论文摘要

端到端的口语理解（SLU）模型是一类模型体系结构，可以直接从语音中预测语义。由于其输入和输出类型，我们将其称为语音到解释（STI）模型。以前的工作已成功地将STI模型应用于有针对性的用例中，例如识别家庭自动化命令，但是尚无研究尚未解决这些模型如何推广到更广泛的用例。在这项工作中，我们分析了STI模型的性能与应用其应用的难度之间的关系。我们介绍了数据集语义复杂性的经验度量，以量化SLU任务的难度。我们表明，在文献中报道的STI模型的近乎完美的性能指标是通过具有低语义复杂性值的数据集获得的。我们执行实验，在其中改变了大型专有数据集的语义复杂性，并表明STI模型性能与我们的语义复杂性度量相关，因此随着复杂性值的降低，性能会增加。我们的结果表明，将STI模型的性能与其训练数据集的复杂性值相关化，以揭示其适用性的范围很重要。

End-to-end spoken language understanding (SLU) models are a class of model architectures that predict semantics directly from speech. Because of their input and output types, we refer to them as speech-to-interpretation (STI) models. Previous works have successfully applied STI models to targeted use cases, such as recognizing home automation commands, however no study has yet addressed how these models generalize to broader use cases. In this work, we analyze the relationship between the performance of STI models and the difficulty of the use case to which they are applied. We introduce empirical measures of dataset semantic complexity to quantify the difficulty of the SLU tasks. We show that near-perfect performance metrics for STI models reported in the literature were obtained with datasets that have low semantic complexity values. We perform experiments where we vary the semantic complexity of a large, proprietary dataset and show that STI model performance correlates with our semantic complexity measures, such that performance increases as complexity values decrease. Our results show that it is important to contextualize an STI model's performance with the complexity values of its training dataset to reveal the scope of its applicability.

下载PDF全文

下载文献需遵守相关版权规定

论文标题