论文标题
探索变压器中的序列长度瓶颈以进行图像字幕
Exploring the sequence length bottleneck in the Transformer for Image Captioning
论文作者
论文摘要
最新的艺术结构状态依赖于三种方法的组合和变化:卷积,经常性和自我牵键方法。我们的工作试图根据修改序列长度的想法为序列建模的新研究方向奠定基础。为此,我们提出了一种称为“扩展机制”的新方法,该方法将输入序列动态转换为具有不同序列长度的新方法。此外,我们介绍了一种新颖的体系结构,该架构利用这种方法并在MS-COCO 2014数据集上实现了竞争性能,在合奏和单个模型配置中分别在KarPathy测试中产生了134.6和131.4 Cider-D,尽管既不复发也不是官方的在线评估服务器中的130个Cider-D,也既不是复发的,也不是全面的。同时,我们解决了设计中的效率方面,并引入了适合大多数计算资源的方便培训策略,与标准资源相比。源代码可从https://github.com/jchenghu/exploring获得
Most recent state of the art architectures rely on combinations and variations of three approaches: convolutional, recurrent and self-attentive methods. Our work attempts in laying the basis for a new research direction for sequence modeling based upon the idea of modifying the sequence length. In order to do that, we propose a new method called "Expansion Mechanism" which transforms either dynamically or statically the input sequence into a new one featuring a different sequence length. Furthermore, we introduce a novel architecture that exploits such method and achieves competitive performances on the MS-COCO 2014 data set, yielding 134.6 and 131.4 CIDEr-D on the Karpathy test split in the ensemble and single model configuration respectively and 130 CIDEr-D in the official online evaluation server, despite being neither recurrent nor fully attentive. At the same time we address the efficiency aspect in our design and introduce a convenient training strategy suitable for most computational resources in contrast to the standard one. Source code is available at https://github.com/jchenghu/exploring