车道：用于泳道检测的对象感知的行柱变压器

论文标题

车道：用于泳道检测的对象感知的行柱变压器

Laneformer: Object-aware Row-Column Transformers for Lane Detection

论文作者

Han, Jianhua, Deng, Xiajun, Cai, Xinyue, Yang, Zhen, Xu, Hang, Xu, Chunjing, Liang, Xiaodan

论文摘要

我们提出了LaneFormer，这是一种概念上简单但功能强大的基于变压器的建筑，该体系结构量身定制为车道检测，是自动驾驶视觉感知的长期研究主题。主要的范式依赖于纯粹基于CNN的架构，这些体系结构通常无法纳入周围物体（例如行人，车辆）引起的远程巷点和全球环境的关系。受到各种视觉任务中的变压器编码器架构的最新进展的启发，我们向前迈进，设计了一种新的端到端车道式体系结构，将传统的变压器彻底改变，从而更好地捕获了车道的形状和语义特征，而延迟的开销很小。首先，与编码器中的可变形像素自我注意相结合，车道构成了两个新的行和列自我发项操作，以有效地与车道形状一起进行挖掘点上下文。其次，由出现对象的动机会影响预测车道段的决定，车道进一步包括被检测到的对象实例，作为编码器和解码器中多头注意块的额外输入，以通过传感语义上下文来促进车道点检测。具体而言，将对象的边界框位置添加到密钥模块中，以提供与每个像素和查询的相互作用，同时将ROI分配的功能插入到值模块中。广泛的实验表明，我们的车道实验可以在F1得分77.1％的情况下实现库兰尼基准的最先进性能。我们希望我们的简单有效的车道形式能够成为自我注意事件检测模型的未来研究的强大基准。

We present Laneformer, a conceptually simple yet powerful transformer-based architecture tailored for lane detection that is a long-standing research topic for visual perception in autonomous driving. The dominant paradigms rely on purely CNN-based architectures which often fail in incorporating relations of long-range lane points and global contexts induced by surrounding objects (e.g., pedestrians, vehicles). Inspired by recent advances of the transformer encoder-decoder architecture in various vision tasks, we move forwards to design a new end-to-end Laneformer architecture that revolutionizes the conventional transformers into better capturing the shape and semantic characteristics of lanes, with minimal overhead in latency. First, coupling with deformable pixel-wise self-attention in the encoder, Laneformer presents two new row and column self-attention operations to efficiently mine point context along with the lane shapes. Second, motivated by the appearing objects would affect the decision of predicting lane segments, Laneformer further includes the detected object instances as extra inputs of multi-head attention blocks in the encoder and decoder to facilitate the lane point detection by sensing semantic contexts. Specifically, the bounding box locations of objects are added into Key module to provide interaction with each pixel and query while the ROI-aligned features are inserted into Value module. Extensive experiments demonstrate our Laneformer achieves state-of-the-art performances on CULane benchmark, in terms of 77.1% F1 score. We hope our simple and effective Laneformer will serve as a strong baseline for future research in self-attention models for lane detection.

下载PDF全文

下载文献需遵守相关版权规定

论文标题