论文标题

文本信息集合的集中度关​​注

Text Information Aggregation with Centrality Attention

论文作者

Gong, Jingjing, Yan, Hang, Zheng, Yining, Qiu, Xipeng, Huang, Xuanjing

论文摘要

许多自然语言处理问题需要将文本顺序编码为固定长度向量,这通常涉及将所有单词(例如汇总或自我注意)组合的聚合过程。但是,这些广泛使用的聚合方法并未考虑到这些单词之间的高阶关系。因此,我们提出了一种获得聚合权重的新方法,称为本征中心自我注意力。更具体地说,我们为句子中的所有单词构建了一个完全连接的图,然后计算本特征中心作为每个单词的注意力分数。 将关系作为图表的显式建模能够捕获单词之间的一些高阶依赖性,这有助于我们在5个文本分类任务和一个SNLI任务中获得更好的结果,而不是基线模型,例如汇总,自我注意力和动态路由。此外,为了计算图形的主要特征向量,我们采用幂方法算法来获得本征中心的度量。此外,我们还得出了一种迭代方法,以获取功率方法过程的梯度,以减少记忆消耗和计算要求。}

A lot of natural language processing problems need to encode the text sequence as a fix-length vector, which usually involves aggregation process of combining the representations of all the words, such as pooling or self-attention. However, these widely used aggregation approaches did not take higher-order relationship among the words into consideration. Hence we propose a new way of obtaining aggregation weights, called eigen-centrality self-attention. More specifically, we build a fully-connected graph for all the words in a sentence, then compute the eigen-centrality as the attention score of each word. The explicit modeling of relationships as a graph is able to capture some higher-order dependency among words, which helps us achieve better results in 5 text classification tasks and one SNLI task than baseline models such as pooling, self-attention and dynamic routing. Besides, in order to compute the dominant eigenvector of the graph, we adopt power method algorithm to get the eigen-centrality measure. Moreover, we also derive an iterative approach to get the gradient for the power method process to reduce both memory consumption and computation requirement.}

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源