论文标题
合成:无人机图像中用于对象检测的合奏网络
SyNet: An Ensemble Network for Object Detection in UAV Images
论文作者
论文摘要
相机配备的无人机应用程序及其广泛使用的最新进展增加了对基于视觉的对象检测算法的需求。对象检测过程本质上是一项挑战性的任务,因为在无人机(或无人机上)使用对象检测算法是一个相对的新领域,因此在空中图像中检测对象仍然是一个更具挑战性的问题。有几个原因包括:(i)缺乏大型无人机数据集,包括大物体差异,(ii)与地面图像相比,无人机图像中的大方向和尺度差异,以及(iii)地面和空中图像之间的质地和形状特征的差异。基于深度学习的对象检测算法可以分为两个主要类别:(a)单阶段检测器和(b)多阶段检测器。单阶段和多阶段解决方案都具有相互的优势和缺点。但是,将每种解决方案的良好方面结合的技术比单独的每个解决方案都能产生更强的解决方案。在本文中,我们提出了一个集合网络合成网,该网络将多阶段方法与单阶段方法结合在一起,并动机降低了多阶段探测器的高假负率并提高单级检测器建议的质量。作为构建基块,使用了验证的特征提取器的Centernet和Cascade R-CNN以及结合策略。我们报告了我们提出的解决方案在两个不同的数据集上获得的艺术结果的状态:即MS-Coco和vistrone \%52.1 $ map_ {iou = 0.75} $在MS-Coco $ VAL2017 $ dataset上获得,\%26.2 $ map_ {iouou = 0.75} $ test-$ test $ test $ test-$ test-$ test $ test-$
Recent advances in camera equipped drone applications and their widespread use increased the demand on vision based object detection algorithms for aerial images. Object detection process is inherently a challenging task as a generic computer vision problem, however, since the use of object detection algorithms on UAVs (or on drones) is relatively a new area, it remains as a more challenging problem to detect objects in aerial images. There are several reasons for that including: (i) the lack of large drone datasets including large object variance, (ii) the large orientation and scale variance in drone images when compared to the ground images, and (iii) the difference in texture and shape features between the ground and the aerial images. Deep learning based object detection algorithms can be classified under two main categories: (a) single-stage detectors and (b) multi-stage detectors. Both single-stage and multi-stage solutions have their advantages and disadvantages over each other. However, a technique to combine the good sides of each of those solutions could yield even a stronger solution than each of those solutions individually. In this paper, we propose an ensemble network, SyNet, that combines a multi-stage method with a single-stage one with the motivation of decreasing the high false negative rate of multi-stage detectors and increasing the quality of the single-stage detector proposals. As building blocks, CenterNet and Cascade R-CNN with pretrained feature extractors are utilized along with an ensembling strategy. We report the state of the art results obtained by our proposed solution on two different datasets: namely MS-COCO and visDrone with \%52.1 $mAP_{IoU = 0.75}$ is obtained on MS-COCO $val2017$ dataset and \%26.2 $mAP_{IoU = 0.75}$ is obtained on VisDrone $test-set$.