通过数据增强来解决语义解析中的资源和隐私约束

论文标题

通过数据增强来解决语义解析中的资源和隐私约束

Addressing Resource and Privacy Constraints in Semantic Parsing Through Data Augmentation

论文作者

Yang, Kevin, Deng, Olivia, Chen, Charles, Shin, Richard, Roy, Subhro, Van Durme, Benjamin

论文摘要

我们引入了一种新颖的设置，用于低资源的面向任务的语义解析，该设置包含了在现实世界中可能出现的几个约束：（1）缺乏相关域中的类似数据集/模型，（2）无法直接从语法上进行有用的逻辑形式，以及（3）对无标记的自然露出的隐私要求。我们的目标是使用用户互动收集的话语来改善低资源语义解析器。在这种高度挑战但现实的环境中，我们研究了涉及生成一组与逻辑形式相对应的结构化规范的话语的数据增强方法，然后在模拟相应的自然语言并过滤结果对。我们发现，尽管设置了限制性设置，但这种方法仍是有效的：在复杂的Smcalflow日历数据集中的低资源设置中（Andreas等，2020），我们观察到比Top-1匹配中非DATA-EAGENTED基线的相对相对改善。

We introduce a novel setup for low-resource task-oriented semantic parsing which incorporates several constraints that may arise in real-world scenarios: (1) lack of similar datasets/models from a related domain, (2) inability to sample useful logical forms directly from a grammar, and (3) privacy requirements for unlabeled natural utterances. Our goal is to improve a low-resource semantic parser using utterances collected through user interactions. In this highly challenging but realistic setting, we investigate data augmentation approaches involving generating a set of structured canonical utterances corresponding to logical forms, before simulating corresponding natural language and filtering the resulting pairs. We find that such approaches are effective despite our restrictive setup: in a low-resource setting on the complex SMCalFlow calendaring dataset (Andreas et al., 2020), we observe 33% relative improvement over a non-data-augmented baseline in top-1 match.

下载PDF全文

下载文献需遵守相关版权规定

论文标题