论文标题
加速在太空中的深度学习应用
Accelerating Deep Learning Applications in Space
论文作者
论文摘要
边缘的计算为发展自主和人工智能的发展提供了有趣的可能性。自主技术的进步和计算机视觉的复兴导致对快速和可靠的深度学习应用的需求增加。近年来,该行业引入了具有令人印象深刻的处理能力以执行各种对象检测任务的设备。但是,通过实时检测,设备在内存,计算能力和功率上受到限制,这可能会损害整体性能。可以通过优化对象检测器或修改图像来解决这。在本文中,我们在应用不同的图像压缩技术时研究了基于CNN的对象检测器的性能。我们检查了Nvidia Jetson Nano的功能;一台低功率,高性能的计算机,具有集成的GPU,足够小,足以适合板上的立方体。我们仔细研究了单个镜头多贝克斯检测器(SSD)和基于区域的完全卷积网络(R-FCN),这些网络已在DOTA上进行了预训练 - 在空中图像中用于对象检测的大型数据集。性能是根据推理时间,记忆消耗和准确性来衡量的。通过应用图像压缩技术,我们能够优化性能。应用的两种技术,无损的压缩和图像缩放,可以提高速度和内存消耗,而准确性没有或很少变化。图像缩放技术实现了100%可运行的数据集,我们建议将这两种技术组合起来,以优化速度/内存/准确性权衡。
Computing at the edge offers intriguing possibilities for the development of autonomy and artificial intelligence. The advancements in autonomous technologies and the resurgence of computer vision have led to a rise in demand for fast and reliable deep learning applications. In recent years, the industry has introduced devices with impressive processing power to perform various object detection tasks. However, with real-time detection, devices are constrained in memory, computational capacity, and power, which may compromise the overall performance. This could be solved either by optimizing the object detector or modifying the images. In this paper, we investigate the performance of CNN-based object detectors on constrained devices when applying different image compression techniques. We examine the capabilities of a NVIDIA Jetson Nano; a low-power, high-performance computer, with an integrated GPU, small enough to fit on-board a CubeSat. We take a closer look at the Single Shot MultiBox Detector (SSD) and Region-based Fully Convolutional Network (R-FCN) that are pre-trained on DOTA - a Large Scale Dataset for Object Detection in Aerial Images. The performance is measured in terms of inference time, memory consumption, and accuracy. By applying image compression techniques, we are able to optimize performance. The two techniques applied, lossless compression and image scaling, improves speed and memory consumption with no or little change in accuracy. The image scaling technique achieves a 100% runnable dataset and we suggest combining both techniques in order to optimize the speed/memory/accuracy trade-off.