论文标题

您需要什么密集的图表?

What Dense Graph Do You Need for Self-Attention?

论文作者

Wang, Yuxin, Lee, Chu-Tak, Guo, Qipeng, Yin, Zhangyue, Zhou, Yunhua, Huang, Xuanjing, Qiu, Xipeng

论文摘要

变形金刚在杂项任务中取得了进展,但遭受了二次计算和记忆复杂性的困扰。最近的作品提出了稀疏的变压器对稀疏图的关注,以降低复杂性并保持强劲的性能。虽然有效,但尚未完全探索图形如何进行良好表现的关键部分。在本文中,我们提出了标准化信息有效载荷(NIP),这是一个图表函数,测量图上信息传输,该函数为性能和复杂性之间的权衡提供了分析工具。在此理论分析的指导下,我们提出了HyperCube Transformer,这是一种稀疏的变压器,它模拟了HyperCube中的标记相互作用,并与Vanilla Transformer显示出可比甚至更好的结果,同时产生了$ O(N \ log n)$复杂性,具有序列长度$ n $。对我们的图形功能的各种序列长度进行验证的任务实验。

Transformers have made progress in miscellaneous tasks, but suffer from quadratic computational and memory complexities. Recent works propose sparse Transformers with attention on sparse graphs to reduce complexity and remain strong performance. While effective, the crucial parts of how dense a graph needs to be to perform well are not fully explored. In this paper, we propose Normalized Information Payload (NIP), a graph scoring function measuring information transfer on graph, which provides an analysis tool for trade-offs between performance and complexity. Guided by this theoretical analysis, we present Hypercube Transformer, a sparse Transformer that models token interactions in a hypercube and shows comparable or even better results with vanilla Transformer while yielding $O(N\log N)$ complexity with sequence length $N$. Experiments on tasks requiring various sequence lengths lay validation for our graph function well.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源