来自许多来源的行：通过预先训练的语言模型从Wikidata中丰富行完成

论文标题

来自许多来源的行：通过预先训练的语言模型从Wikidata中丰富行完成

Rows from Many Sources: Enriching row completions from Wikidata with a pre-trained Language Model

论文作者

Negreanu, Carina, Karaoglu, Alperen, Williams, Jack, Chen, Shuang, Fabian, Daniel, Gordon, Andrew, Lin, Chin-Yew

论文摘要

行完成是增加给定的文本和数字表和其他相关行的任务。任务分为两个步骤：主题建议，填充主列的任务；和间隙填充，填充其余列的任务。我们提供了在标准基准（Wikitobles）上测量的主题建议和间隙填充的最新结果。我们的想法是通过和谐地结合知识基础表解释和自由文本生成来解决这项任务。我们使用知识库来解释表，以建议新行并通过属性链接产生元数据。为了改善候选人的多样性，我们使用GPT-3的自由文本生成合成了其他行，至关重要的是，我们利用了我们解释的元数据，以产生更好的提示来生成文本。最后，我们验证了其他合成内容可以链接到知识库或受信任的Web源（例如Wikipedia）。

Row completion is the task of augmenting a given table of text and numbers with additional, relevant rows. The task divides into two steps: subject suggestion, the task of populating the main column; and gap filling, the task of populating the remaining columns. We present state-of-the-art results for subject suggestion and gap filling measured on a standard benchmark (WikiTables). Our idea is to solve this task by harmoniously combining knowledge base table interpretation and free text generation. We interpret the table using the knowledge base to suggest new rows and generate metadata like headers through property linking. To improve candidate diversity, we synthesize additional rows using free text generation via GPT-3, and crucially, we exploit the metadata we interpret to produce better prompts for text generation. Finally, we verify that the additional synthesized content can be linked to the knowledge base or a trusted web source such as Wikipedia.

下载PDF全文

下载文献需遵守相关版权规定

论文标题