论文标题
SDC弹性遇到的有损压缩机
SDC Resilient Error-bounded Lossy Compressor
论文作者
论文摘要
有损压缩是解决大型科学数据问题的最重要策略之一,但是,几乎没有为使其抵御无声数据腐败(SDC)的弹性所做的工作。实际上,由于对复杂的科学模拟的EXA规模计算需求,SDC变得不可忽略,并在某些特定的仪器/设备(例如星际空间探针)中需要在错误的环境中传输大量数据。在本文中,我们在SZ压缩框架上提出了一个SDC弹性遇到的损耗压缩机。具体而言,我们采用了一种新的独立块模型,该模型将整个数据集分解为许多独立的子块以压缩。然后,我们设计并实施了基于SZ的一系列错误检测/校正策略。我们是第一个将基于算法的容错(ABFT)扩展到有损压缩的人。我们提出的解决方案会在没有软错误的情况下造成可忽略的执行开销。它保持了仍在用户需求中的解压缩数据的正确性,而软错误时压缩比的降解非常有限。
Lossy compression is one of the most important strategies to resolve the big science data issue, however, little work was done to make it resilient against silent data corruptions (SDC). In fact, SDC is becoming non-negligible because of exa-scale computing demand on complex scientific simulations with vast volume of data being produced or in some particular instruments/devices (such as interplanetary space probe) that need to transfer large amount of data in an error-prone environment. In this paper, we propose an SDC resilient error-bounded lossy compressor upon the SZ compression framework. Specifically, we adopt a new independent-block-wise model that decomposes the entire dataset into many independent sub-blocks to compress. Then, we design and implement a series of error detection/correction strategies based on SZ. We are the first to extend algorithm-based fault tolerance (ABFT) to lossy compression. Our proposed solution incurs negligible execution overhead without soft errors. It keeps the correctness of decompressed data still bounded within user's requirement with a very limited degradation of compression ratios upon soft errors.