快速视觉感知器的动态查询选择

论文标题

快速视觉感知器的动态查询选择

Dynamic Query Selection for Fast Visual Perceiver

论文作者

Dancette, Corentin, Cord, Matthieu

论文摘要

在最近的作品中，变形金刚一直在匹配视觉架构的深卷卷网络。大多数工作都集中在大规模基准上取得最佳结果，而扩展法似乎是最成功的策略：更大的模型，更多的数据和更长的培训结果可以提高性能。但是，降低网络复杂性和推理时间的探索仍然不足。感知器模型提供了解决此问题的解决方案：首先执行与固定的潜在查询令牌Q的跨注意事项，L-Layers Transformer网络的复杂性是由O（LQ^2）界定的。在这项工作中，我们通过减少推理期间的查询Q数量，同时限制精度下降，从而探索如何使感知者更加有效。

Transformers have been matching deep convolutional networks for vision architectures in recent works. Most work is focused on getting the best results on large-scale benchmarks, and scaling laws seem to be the most successful strategy: bigger models, more data, and longer training result in higher performance. However, the reduction of network complexity and inference time remains under-explored. The Perceiver model offers a solution to this problem: by first performing a Cross-attention with a fixed number Q of latent query tokens, the complexity of the L-layers Transformer network that follows is bounded by O(LQ^2). In this work, we explore how to make Perceivers even more efficient, by reducing the number of queries Q during inference while limiting the accuracy drop.

下载PDF全文

下载文献需遵守相关版权规定

论文标题