GPU的成像和视觉管道的有效自动调度

论文标题

GPU的成像和视觉管道的有效自动调度

Efficient Automatic Scheduling of Imaging and Vision Pipelines for the GPU

论文作者

Anderson, Luke, Adams, Andrew, Ma, Karima, Li, Tzu-Mao, Jin, Tian, Ragan-Kelley, Jonathan

论文摘要

我们提出了一种新算法，可以直接从高级Halide算法代码中快速生成复杂成像和视觉管道的高性能GPU实现。它是全自动的，不需要时间表模板或手工优化的内核。我们解决将基于搜索的自动调度扩展到将大型现实世界程序映射到在合理的编译时间内对GPU体系结构的深层层次结构的可扩展性挑战。我们使用（1）两阶段搜索算法首先“冻结”计划的最低成本部分“冻结”决策，从而使相对较高的时间在重要阶段花在重要阶段，（2）层次样本抽样策略，将层次的抽样策略分组为基于其结构相似性的分组，然后将我们的示例代表估算，然后探索示例的示例，并探索示例的示例，并在范围内进行示例，并享受少数的same same same same same same same same same and same same same same（时间表，在所有发生的情况下摊销其成本。我们使用有效的成本模型来指导该过程，将机器学习，程序分析和GPU体系结构知识结合在一起。我们评估了我们的方法在多种现实成像和视觉管道的各种套件上的性能。我们的可伸缩性优化导致平均编译时间速度为49倍（高达530倍）。我们发现的时间表平均比现有的自动解决方案（最高5倍）快1.7倍，并且与最好的人类专家能够实现的目标竞争，以积极地击败我们的自动结果。

We present a new algorithm to quickly generate high-performance GPU implementations of complex imaging and vision pipelines, directly from high-level Halide algorithm code. It is fully automatic, requiring no schedule templates or hand-optimized kernels. We address the scalability challenge of extending search-based automatic scheduling to map large real-world programs to the deep hierarchies of memory and parallelism on GPU architectures in reasonable compile time. We achieve this using (1) a two-phase search algorithm that first 'freezes' decisions for the lowest cost sections of a program, allowing relatively more time to be spent on the important stages, (2) a hierarchical sampling strategy that groups schedules based on their structural similarity, then samples representatives to be evaluated, allowing us to explore a large space with few samples, and (3) memoization of repeated partial schedules, amortizing their cost over all their occurrences. We guide the process with an efficient cost model combining machine learning, program analysis, and GPU architecture knowledge. We evaluate our method's performance on a diverse suite of real-world imaging and vision pipelines. Our scalability optimizations lead to average compile time speedups of 49x (up to 530x). We find schedules that are on average 1.7x faster than existing automatic solutions (up to 5x), and competitive with what the best human experts were able to achieve in an active effort to beat our automatic results.

下载PDF全文

下载文献需遵守相关版权规定

论文标题