在大规模对话中基于自我反馈的自我学习

论文标题

在大规模对话中基于自我反馈的自我学习

Self-Aware Feedback-Based Self-Learning in Large-Scale Conversational AI

论文作者

Ponnusamy, Pragaash, Mathialagan, Clint Solomon, Aguilar, Gustavo, Ma, Chengyuan, Guo, Chenlei

论文摘要

大规模对话式AI代理中的自学范例倾向于利用用户反馈在他们所说的话和意思之间弥合。但是，这种学习，尤其是在基于马尔可夫的查询重写系统中，远非解决这些模型对未来培训的影响，在未来的培训中，连续反馈不可避免地取决于重写本身，尤其是在不断更新的环境中。在本文中，我们探讨了这种固有缺乏自我意识对损害模型性能的后果，最终导致了I型和II型错误。为此，我们建议使用基于叠加的邻接矩阵来增强马尔可夫图构造。在这里，我们的方法利用了诱导的随机性来反应地基于在双变量beta设置中的单个重写的性能来反应地学习局部自适应的决策边界。我们还表现出一种数据增强策略，该策略利用基于模板的生成在简化对话框的复杂对话层次结构中，以简化学习过程。总而言之，我们证明我们的自我意识模型将整体PR-AUC提高了27.45％，可实现高达31.22％的相对缺陷，并能够更快地适应许多客户的全球偏好变化。

Self-learning paradigms in large-scale conversational AI agents tend to leverage user feedback in bridging between what they say and what they mean. However, such learning, particularly in Markov-based query rewriting systems have far from addressed the impact of these models on future training where successive feedback is inevitably contingent on the rewrite itself, especially in a continually updating environment. In this paper, we explore the consequences of this inherent lack of self-awareness towards impairing the model performance, ultimately resulting in both Type I and II errors over time. To that end, we propose augmenting the Markov Graph construction with a superposition-based adjacency matrix. Here, our method leverages an induced stochasticity to reactively learn a locally-adaptive decision boundary based on the performance of the individual rewrites in a bi-variate beta setting. We also surface a data augmentation strategy that leverages template-based generation in abridging complex conversation hierarchies of dialogs so as to simplify the learning process. All in all, we demonstrate that our self-aware model improves the overall PR-AUC by 27.45%, achieves a relative defect reduction of up to 31.22%, and is able to adapt quicker to changes in global preferences across a large number of customers.

下载PDF全文

下载文献需遵守相关版权规定

论文标题