论文标题
图像梯度分解,用于并联和记忆有效的ptychographic重建
Image Gradient Decomposition for Parallel and Memory-Efficient Ptychographic Reconstruction
论文作者
论文摘要
PtyChography是许多科学发现的流行微观成像方式,并为最高图像分辨率创造了记录。不幸的是,用于ptychographic重建的高图像分辨率需要大量的内存和计算,迫使许多应用程序损害其图像分辨率,以换取较小的内存足迹和较短的重建时间。在本文中,我们提出了一种新型的图像梯度分解方法,该方法通过将图像梯度和衍射测量测量到瓷砖中大大降低了ptychographic重建的记忆足迹。此外,我们提出了一种平行图像梯度分解方法,该方法可以使大量GPU上的异步点对点通信和平行管道平行。我们在16632探针位置上对钛材料数据集(PBTIO3)进行的实验表明,我们的梯度分解算法将记忆足迹降低了51倍。此外,与6 gpus的Runtime相比,超级线性较强的缩放效率(364%)在2.2分钟内实现了2.2分钟的时间。该性能是内存效率的2.7倍,比最新算法的可扩展性高达9倍,并且高达86倍。
Ptychography is a popular microscopic imaging modality for many scientific discoveries and sets the record for highest image resolution. Unfortunately, the high image resolution for ptychographic reconstruction requires significant amount of memory and computations, forcing many applications to compromise their image resolution in exchange for a smaller memory footprint and a shorter reconstruction time. In this paper, we propose a novel image gradient decomposition method that significantly reduces the memory footprint for ptychographic reconstruction by tessellating image gradients and diffraction measurements into tiles. In addition, we propose a parallel image gradient decomposition method that enables asynchronous point-to-point communications and parallel pipelining with minimal overhead on a large number of GPUs. Our experiments on a Titanate material dataset (PbTiO3) with 16632 probe locations show that our Gradient Decomposition algorithm reduces memory footprint by 51 times. In addition, it achieves time-to-solution within 2.2 minutes by scaling to 4158 GPUs with a super-linear strong scaling efficiency at 364% compared to runtimes at 6 GPUs. This performance is 2.7 times more memory efficient, 9 times more scalable and 86 times faster than the state-of-the-art algorithm.