非程序员可以通过主动示例间接标记程序：具有文本到SQL的案例研究

论文标题

非程序员可以通过主动示例间接标记程序：具有文本到SQL的案例研究

Non-Programmers Can Label Programs Indirectly via Active Examples: A Case Study with Text-to-SQL

论文作者

Zhong, Ruiqi, Snell, Charlie, Klein, Dan, Eisner, Jason

论文摘要

非程序员可以用代表其含义的复杂程序注释自然语言吗？我们介绍了APEL，这是一个框架，其中非程序员在种子语义解析器生成的候选程序中选择该框架（例如Codex）。由于他们无法理解候选程序，因此我们要求他们通过检查程序的输入式示例间接选择。对于每种话语，APEL都会积极搜索一个简单的输入，候选程序倾向于在该输入上产生不同的输出。然后，它要求非程序员选择适当的输出，从而使我们可以推断出哪个程序是正确的，并且可以用于微调解析器。作为第一个案例研究，我们招募了人类的非程序员使用APEL重新注释蜘蛛，这是一种文本到SQL数据集。我们的方法达到了与原始专家注释者相同的注释精度（75％），并在原始注释中暴露了许多微妙的错误。

Can non-programmers annotate natural language utterances with complex programs that represent their meaning? We introduce APEL, a framework in which non-programmers select among candidate programs generated by a seed semantic parser (e.g., Codex). Since they cannot understand the candidate programs, we ask them to select indirectly by examining the programs' input-ouput examples. For each utterance, APEL actively searches for a simple input on which the candidate programs tend to produce different outputs. It then asks the non-programmers only to choose the appropriate output, thus allowing us to infer which program is correct and could be used to fine-tune the parser. As a first case study, we recruited human non-programmers to use APEL to re-annotate SPIDER, a text-to-SQL dataset. Our approach achieved the same annotation accuracy as the original expert annotators (75%) and exposed many subtle errors in the original annotations.

下载PDF全文

下载文献需遵守相关版权规定

论文标题