论文标题
视频框架与变压器的插值
Video Frame Interpolation with Transformer
论文作者
论文摘要
旨在综合视频中间框架的视频框架插值(VFI)在过去几年中深度卷积网络的发展方面取得了显着进展。构建基于卷积网络的现有方法通常面临处理大型运动的挑战。为了克服这一限制,我们引入了一个新颖的框架,该框架利用变压器对视频帧之间的远程像素相关进行建模。此外,我们的网络配备了一种新型的跨尺度基于窗口的注意机制,跨尺度的窗户相互相互作用。该设计有效地扩大了接受场并汇总了多尺度信息。广泛的定量和定性实验表明,我们的方法在各种基准上实现了新的最新结果。
Video frame interpolation (VFI), which aims to synthesize intermediate frames of a video, has made remarkable progress with development of deep convolutional networks over past years. Existing methods built upon convolutional networks generally face challenges of handling large motion due to the locality of convolution operations. To overcome this limitation, we introduce a novel framework, which takes advantage of Transformer to model long-range pixel correlation among video frames. Further, our network is equipped with a novel cross-scale window-based attention mechanism, where cross-scale windows interact with each other. This design effectively enlarges the receptive field and aggregates multi-scale information. Extensive quantitative and qualitative experiments demonstrate that our method achieves new state-of-the-art results on various benchmarks.