论文标题

快速视觉感知器的动态查询选择

Dynamic Query Selection for Fast Visual Perceiver

论文作者

Dancette, Corentin, Cord, Matthieu

论文摘要

在最近的作品中,变形金刚一直在匹配视觉架构的深卷卷网络。大多数工作都集中在大规模基准上取得最佳结果,而扩展法似乎是最成功的策略:更大的模型,更多的数据和更长的培训结果可以提高性能。但是,降低网络复杂性和推理时间的探索仍然不足。感知器模型提供了解决此问题的解决方案:首先执行与固定的潜在查询令牌Q的跨注意事项,L-Layers Transformer网络的复杂性是由O(LQ^2)界定的。在这项工作中,我们通过减少推理期间的查询Q数量,同时限制精度下降,从而探索如何使感知者更加有效。

Transformers have been matching deep convolutional networks for vision architectures in recent works. Most work is focused on getting the best results on large-scale benchmarks, and scaling laws seem to be the most successful strategy: bigger models, more data, and longer training result in higher performance. However, the reduction of network complexity and inference time remains under-explored. The Perceiver model offers a solution to this problem: by first performing a Cross-attention with a fixed number Q of latent query tokens, the complexity of the L-layers Transformer network that follows is bounded by O(LQ^2). In this work, we explore how to make Perceivers even more efficient, by reducing the number of queries Q during inference while limiting the accuracy drop.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源