论文标题

领带:拓扑信息增强了网页上的结构阅读理解

TIE: Topological Information Enhanced Structural Reading Comprehension on Web Pages

论文作者

Zhao, Zihan, Chen, Lu, Cao, Ruisheng, Xu, Hongshen, Chen, Xingyu, Yu, Kai

论文摘要

最近,网页上的结构阅读理解(SRC)任务吸引了日益增长的研究兴趣。尽管以前的SRC工作利用了诸如HTML标签或XPATH之类的额外信息,但是网页的信息拓扑并未有效利用。在这项工作中,我们提出了一个拓扑信息增强模型(TIE),该模型将令牌级的任务转换为标签级任务,通过引入两个阶段的过程(即节点定位和回答完善)。基于此,TIE集成了图形注意网络(GAT)和预训练的语言模型(PLM),以利用逻辑结构和空间结构的拓扑信息。实验结果表明,我们的模型在撰写本文时超过了强大的基线,并在基于Web的SRC基准WebRC上实现了最先进的表现。 TIE代码将在https://github.com/x-lance/tie上公开获得。

Recently, the structural reading comprehension (SRC) task on web pages has attracted increasing research interests. Although previous SRC work has leveraged extra information such as HTML tags or XPaths, the informative topology of web pages is not effectively exploited. In this work, we propose a Topological Information Enhanced model (TIE), which transforms the token-level task into a tag-level task by introducing a two-stage process (i.e. node locating and answer refining). Based on that, TIE integrates Graph Attention Network (GAT) and Pre-trained Language Model (PLM) to leverage the topological information of both logical structures and spatial structures. Experimental results demonstrate that our model outperforms strong baselines and achieves state-of-the-art performances on the web-based SRC benchmark WebSRC at the time of writing. The code of TIE will be publicly available at https://github.com/X-LANCE/TIE.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源