论文标题

具有结构意识的模棱两可学习

Robust (Controlled) Table-to-Text Generation with Structure-Aware Equivariance Learning

论文作者

Wang, Fei, Xu, Zhewei, Szekely, Pedro, Chen, Muhao

论文摘要

受控的桌面到文本生成旨在为表的突出显示表格生成自然语言描述。先前的SOTA系统仍采用序列到序列生成方法,该方法仅将表捕获为线性结构,并且当表布局更改时是脆弱的。我们试图通过(1)有效地表达表格中内容的关系,以及(2)使我们的模型对内容不变的结构转换的稳健性。因此,我们提出了一个模棱两可的学习框架,该框架用结构感知的自我发项机制编码表。这将完整的自我发项结构修剪成一个订单不变的图形注意力,该图捕获了属于同一行或列的单元格的连接图形结构,并且从结构的角度将相关的细胞和无关的细胞区分开来。我们的框架还修改了位置编码机制,以保留令牌在同一细胞中的相对位置,但在不同细胞之间执行位置不变性。我们的技术可以免费插入现有的表到文本生成模型中,并改进了基于T5的模型,以在Totto和Hitab上提供更好的性能。此外,在较难版本的托托(Totto)上,我们保留了有希望的性能,而以前的SOTA系统,即使是基于转换的数据增强,也看到了大量的性能下降。我们的代码可在https://github.com/luka-group/lattice上找到。

Controlled table-to-text generation seeks to generate natural language descriptions for highlighted subparts of a table. Previous SOTA systems still employ a sequence-to-sequence generation method, which merely captures the table as a linear structure and is brittle when table layouts change. We seek to go beyond this paradigm by (1) effectively expressing the relations of content pieces in the table, and (2) making our model robust to content-invariant structural transformations. Accordingly, we propose an equivariance learning framework, which encodes tables with a structure-aware self-attention mechanism. This prunes the full self-attention structure into an order-invariant graph attention that captures the connected graph structure of cells belonging to the same row or column, and it differentiates between relevant cells and irrelevant cells from the structural perspective. Our framework also modifies the positional encoding mechanism to preserve the relative position of tokens in the same cell but enforce position invariance among different cells. Our technology is free to be plugged into existing table-to-text generation models, and has improved T5-based models to offer better performance on ToTTo and HiTab. Moreover, on a harder version of ToTTo, we preserve promising performance, while previous SOTA systems, even with transformation-based data augmentation, have seen significant performance drops. Our code is available at https://github.com/luka-group/Lattice.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源