为学习图像压缩的身份保留损失

论文标题

为学习图像压缩的身份保留损失

Identity Preserving Loss for Learned Image Compression

论文作者

Xiao, Jiuhong, Aggarwal, Lavisha, Banerjee, Prithviraj, Aggarwal, Manoj, Medioni, Gerard

论文摘要

由于计算资源的可用性有限，对嵌入式设备的深度学习模型推断具有挑战性。一种流行的替代方法是对云进行模型推断，该推论需要将图像从嵌入式设备传输到云。图像压缩技术通常在此类基于云的体系结构中使用，以减少低带宽网络上的传输潜伏期。这项工作提出了一个端到端图像压缩框架，该框架学习特定于域的特征以达到比标准HEVC/JPEG压缩技术更高的压缩比，同时保持下游任务的准确性（例如识别）。我们的框架不需要对下游任务进行微调，这使我们能够在不进行重新培训的情况下插入任何现成的下游任务模型。由于数据集的现成可用性和现成的识别模型作为代表性下游任务，因此我们选择面部作为应用程序域。我们提出了一种新颖的身份保留重建功能（IPR）损耗函数，该函数可实现LFW（低分辨率）和Celeba-HQ（高分辨率）数据集的CRF-23 HEVC压缩约为38％和〜42％的值，同时维持识别准确性的同时。由于模型学会保留特定于域特异性特征（例如面部特征），同时牺牲了背景中的细节，因此可以达到上级压缩比。此外，通过我们提出的压缩模型重建的图像对于下游模型体系结构的变化是可靠的。我们在LFW数据集上显示出具有看不见的识别模型的AT-PAR识别性能，同时保留了CRF-23 HEVC压缩的BPP值较低的BPP值。

Deep learning model inference on embedded devices is challenging due to the limited availability of computation resources. A popular alternative is to perform model inference on the cloud, which requires transmitting images from the embedded device to the cloud. Image compression techniques are commonly employed in such cloud-based architectures to reduce transmission latency over low bandwidth networks. This work proposes an end-to-end image compression framework that learns domain-specific features to achieve higher compression ratios than standard HEVC/JPEG compression techniques while maintaining accuracy on downstream tasks (e.g., recognition). Our framework does not require fine-tuning of the downstream task, which allows us to drop-in any off-the-shelf downstream task model without retraining. We choose faces as an application domain due to the ready availability of datasets and off-the-shelf recognition models as representative downstream tasks. We present a novel Identity Preserving Reconstruction (IPR) loss function which achieves Bits-Per-Pixel (BPP) values that are ~38% and ~42% of CRF-23 HEVC compression for LFW (low-resolution) and CelebA-HQ (high-resolution) datasets, respectively, while maintaining parity in recognition accuracy. The superior compression ratio is achieved as the model learns to retain the domain-specific features (e.g., facial features) while sacrificing details in the background. Furthermore, images reconstructed by our proposed compression model are robust to changes in downstream model architectures. We show at-par recognition performance on the LFW dataset with an unseen recognition model while retaining a lower BPP value of ~38% of CRF-23 HEVC compression.

下载PDF全文

下载文献需遵守相关版权规定

论文标题