论文标题
简化变压器的深度估计
Depth Estimation with Simplified Transformer
论文作者
论文摘要
变形金刚及其变体最近显示了最新的视觉任务,从图像分类到密集的预测。尽管取得了成功,但据报道,在提高关键延迟应用程序(例如自动驾驶和机器人导航)中部署的模型效率方面的工作有限。在本文中,我们旨在改善视觉中现有的变压器,并通过简化的变压器(DEST)提出一种自我监督的单眼深度估计方法,该方法有效,特别适合在基于GPU的平台上部署。通过战略设计选择,我们的模型可显着降低模型大小,复杂性和推理潜伏期,同时与最先进的相比达到了卓越的准确性。我们还表明,我们的设计很好地推广到没有铃铛和哨声的其他密集的预测任务。
Transformer and its variants have shown state-of-the-art results in many vision tasks recently, ranging from image classification to dense prediction. Despite of their success, limited work has been reported on improving the model efficiency for deployment in latency-critical applications, such as autonomous driving and robotic navigation. In this paper, we aim at improving upon the existing transformers in vision, and propose a method for self-supervised monocular Depth Estimation with Simplified Transformer (DEST), which is efficient and particularly suitable for deployment on GPU-based platforms. Through strategic design choices, our model leads to significant reduction in model size, complexity, as well as inference latency, while achieving superior accuracy as compared to state-of-the-art. We also show that our design generalize well to other dense prediction task without bells and whistles.