机器图像编码中的速率延伸

论文标题

机器图像编码中的速率延伸

Rate-Distortion in Image Coding for Machines

论文作者

Harell, Alon, De Andrade, Anderson, Bajic, Ivan V.

论文摘要

近年来，出于计算机视觉目的，将图像传输到远程服务器的传输急剧增加。在许多应用程序（例如监视）中，图像主要是用于自动分析的，并且很少被人类看到。在这种情况下，使用传统的压缩在比特率方面效率低下，这可能是由于关注基于人类的失真指标。因此，创建特定的图像编码方法是人类和机器共同使用的特定图像编码方法。创建此类编解码器的机器侧的一种方法是在深神经网络中执行某些中间层执行机器任务的功能匹配。在这项工作中，我们探讨了用于培训人类和机器可学习的编解码器的层选择的效果。我们证明，使用数据处理不平等，从速率延伸的意义上讲，更深层的匹配特征是可取的。接下来，我们通过重新培训现有的可扩展人机编码模型来证实我们的发现。在我们的实验中，我们显示了这种可扩展模型的人类和机器方面的权衡，并讨论了在这方面使用更深层进行训练的好处。

In recent years, there has been a sharp increase in transmission of images to remote servers specifically for the purpose of computer vision. In many applications, such as surveillance, images are mostly transmitted for automated analysis, and rarely seen by humans. Using traditional compression for this scenario has been shown to be inefficient in terms of bit-rate, likely due to the focus on human based distortion metrics. Thus, it is important to create specific image coding methods for joint use by humans and machines. One way to create the machine side of such a codec is to perform feature matching of some intermediate layer in a Deep Neural Network performing the machine task. In this work, we explore the effects of the layer choice used in training a learnable codec for humans and machines. We prove, using the data processing inequality, that matching features from deeper layers is preferable in the sense of rate-distortion. Next, we confirm our findings empirically by re-training an existing model for scalable human-machine coding. In our experiments we show the trade-off between the human and machine sides of such a scalable model, and discuss the benefit of using deeper layers for training in that regard.

下载PDF全文

下载文献需遵守相关版权规定

论文标题