用自然语言描述文本分布之间的差异

论文标题

用自然语言描述文本分布之间的差异

Describing Differences between Text Distributions with Natural Language

论文作者

Zhong, Ruiqi, Snell, Charlie, Klein, Dan, Steinhardt, Jacob

论文摘要

文本的两个分布有何不同？人类在回答这个问题方面缓慢，因为发现模式可能需要乏味地阅读数百个样本。我们建议通过“学习自然语言假设”来自动总结差异：给定两个分布$ d_ {0} $和$ d_ {1} $，我们搜索一个描述，对于$ d_ {1} $，例如要解决此问题，我们将GPT-3微调以提示说明：“ [$ d_ {0} $]的样本 + [$ d_ {1} $的样本 +它们之间的差异是_____。”然后，我们通过检查它们在具有学习验证者的较大样本上保存的频率来重新排列描述。在54个现实世界二进制分类任务的基准上，而GPT-3 Curie（13b）仅生成与人类注释相似的描述，在7％的时间里，性能达到61％，并通过微调和重新排列，而我们使用GPT-3 Davinci（175b）的最佳系统达到76％。我们应用系统来描述分布偏移，调试数据集快捷方式，总结未知任务和标签文本簇，并基于自动生成的描述进行了当前的分析。

How do two distributions of texts differ? Humans are slow at answering this, since discovering patterns might require tediously reading through hundreds of samples. We propose to automatically summarize the differences by "learning a natural language hypothesis": given two distributions $D_{0}$ and $D_{1}$, we search for a description that is more often true for $D_{1}$, e.g., "is military-related." To tackle this problem, we fine-tune GPT-3 to propose descriptions with the prompt: "[samples of $D_{0}$] + [samples of $D_{1}$] + the difference between them is_____." We then re-rank the descriptions by checking how often they hold on a larger set of samples with a learned verifier. On a benchmark of 54 real-world binary classification tasks, while GPT-3 Curie (13B) only generates a description similar to human annotation 7% of the time, the performance reaches 61% with fine-tuning and re-ranking, and our best system using GPT-3 Davinci (175B) reaches 76%. We apply our system to describe distribution shifts, debug dataset shortcuts, summarize unknown tasks, and label text clusters, and present analyses based on automatically generated descriptions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题