Text2Graph＆Graph2Text的多任务半监督框架

论文标题

Text2Graph＆Graph2Text的多任务半监督框架

A multi-task semi-supervised framework for Text2Graph & Graph2Text

论文作者

Domingo, Oriol, Costa-jussà, Marta R., Escolano, Carlos

论文摘要

人工智能行业定期开发主要依赖知识库的应用程序，该应用程序是有关特定或一般域的数据存储库，通常以图形形状表示。与其他数据库类似，它们面临两个主要挑战：信息摄入和信息检索。我们通过从图表中共同学习图形和文本生成的图形提取来应对这些挑战。在周期训练方案之后，提出的解决方案是一种T5体系结构，并在我们收集的非平行数据中对多任务半监督环境进行了训练。 WebNLG数据集上的实验表明，我们的方法超过了无监督的最新方法，从而导致文本到图形和图形。更相关的是，与监督模型相比，在可见和看不见的域中，我们的框架更加一致。通过在我们的周期框架中，只需添加文本和图形，就可以轻松地在任何新域中使用非并行数据训练所得模型。

The Artificial Intelligence industry regularly develops applications that mostly rely on Knowledge Bases, a data repository about specific, or general, domains, usually represented in a graph shape. Similar to other databases, they face two main challenges: information ingestion and information retrieval. We approach these challenges by jointly learning graph extraction from text and text generation from graphs. The proposed solution, a T5 architecture, is trained in a multi-task semi-supervised environment, with our collected non-parallel data, following a cycle training regime. Experiments on WebNLG dataset show that our approach surpasses unsupervised state-of-the-art results in text-to-graph and graph-to-text. More relevantly, our framework is more consistent across seen and unseen domains than supervised models. The resulting model can be easily trained in any new domain with non-parallel data, by simply adding text and graphs about it, in our cycle framework.

下载PDF全文

下载文献需遵守相关版权规定

论文标题