文本信息集合的集中度关注

论文标题

文本信息集合的集中度关注

Text Information Aggregation with Centrality Attention

论文作者

Gong, Jingjing, Yan, Hang, Zheng, Yining, Qiu, Xipeng, Huang, Xuanjing

论文摘要

许多自然语言处理问题需要将文本顺序编码为固定长度向量，这通常涉及将所有单词（例如汇总或自我注意）组合的聚合过程。但是，这些广泛使用的聚合方法并未考虑到这些单词之间的高阶关系。因此，我们提出了一种获得聚合权重的新方法，称为本征中心自我注意力。更具体地说，我们为句子中的所有单词构建了一个完全连接的图，然后计算本特征中心作为每个单词的注意力分数。将关系作为图表的显式建模能够捕获单词之间的一些高阶依赖性，这有助于我们在5个文本分类任务和一个SNLI任务中获得更好的结果，而不是基线模型，例如汇总，自我注意力和动态路由。此外，为了计算图形的主要特征向量，我们采用幂方法算法来获得本征中心的度量。此外，我们还得出了一种迭代方法，以获取功率方法过程的梯度，以减少记忆消耗和计算要求。}

A lot of natural language processing problems need to encode the text sequence as a fix-length vector, which usually involves aggregation process of combining the representations of all the words, such as pooling or self-attention. However, these widely used aggregation approaches did not take higher-order relationship among the words into consideration. Hence we propose a new way of obtaining aggregation weights, called eigen-centrality self-attention. More specifically, we build a fully-connected graph for all the words in a sentence, then compute the eigen-centrality as the attention score of each word. The explicit modeling of relationships as a graph is able to capture some higher-order dependency among words, which helps us achieve better results in 5 text classification tasks and one SNLI task than baseline models such as pooling, self-attention and dynamic routing. Besides, in order to compute the dominant eigenvector of the graph, we adopt power method algorithm to get the eigen-centrality measure. Moreover, we also derive an iterative approach to get the gradient for the power method process to reduce both memory consumption and computation requirement.}

下载PDF全文

下载文献需遵守相关版权规定

论文标题