双伴侣的跨视图变压器网络，用于统一的道路布局估算和3D对象检测

论文标题

双伴侣的跨视图变压器网络，用于统一的道路布局估算和3D对象检测

A Dual-Cycled Cross-View Transformer Network for Unified Road Layout Estimation and 3D Object Detection in the Bird's-Eye-View

论文作者

Kim, Curie, Kim, Ue-Hwan

论文摘要

Bird's-eye-view（BEV）表示允许对多个任务进行自主驾驶的多项学习，包括道路布局估计和3D对象检测。但是，统一的道路布局估计和3D对象检测的当代方法很少处理培训数据集和多级学习的类不平衡，以减少所需的网络总数。为了克服这些局限性，我们提出了一个受变压器体系结构和Cyclegan学习框架启发的道路布局估计和3D对象检测的统一模型。提出的模型涉及由于使用焦点损失和提议的双周期损失的数据集的类不平衡而导致的性能下降。此外，我们设置了广泛的学习场景，以研究多级学习对各种情况下的道路布局估计的影响。为了验证提出的模型和学习方案的有效性，我们进行了彻底的消融研究和比较研究。实验结果证明了我们模型的有效性；我们在道路布局估计和3D对象检测任务中都达到了最先进的性能。

The bird's-eye-view (BEV) representation allows robust learning of multiple tasks for autonomous driving including road layout estimation and 3D object detection. However, contemporary methods for unified road layout estimation and 3D object detection rarely handle the class imbalance of the training dataset and multi-class learning to reduce the total number of networks required. To overcome these limitations, we propose a unified model for road layout estimation and 3D object detection inspired by the transformer architecture and the CycleGAN learning framework. The proposed model deals with the performance degradation due to the class imbalance of the dataset utilizing the focal loss and the proposed dual cycle loss. Moreover, we set up extensive learning scenarios to study the effect of multi-class learning for road layout estimation in various situations. To verify the effectiveness of the proposed model and the learning scheme, we conduct a thorough ablation study and a comparative study. The experiment results attest the effectiveness of our model; we achieve state-of-the-art performance in both the road layout estimation and 3D object detection tasks.

下载PDF全文

下载文献需遵守相关版权规定

论文标题