论文标题
学习无损来源编码的范围
Bounds for Learning Lossless Source Coding
论文作者
论文摘要
本文提出了一个基本问题:击败通用源编码器需要多少培训?传统上,源编码器有两种类型:固定的最佳编码器,例如霍夫曼编码器;和通用源编码器(例如Lempel-Ziv)论文考虑了第三种源编码器:学习的编码器。这些是对特定类型数据进行培训的编码器,然后用于编码该类型的新数据。这是一种最近在(有损)图像和视频编码方面非常流行的编码器。 本文考虑了学习者的性能的两个标准:培训数据上的平均性能,除了某些错误概率$ p_e $外,所有培训的保证绩效。在这两种情况下,都对编码人员进行了冗余评估。 该论文考虑了IID二进制案例和二进制马尔可夫链。在这两种情况下,都表明所需的训练数据量非常适度:到长度为$ l $的代码序列击败通用源编码器所需的训练数据量为$ M = K \ frac {l} {\ log log l l} $,其中前面的常数取决于情况。
This paper asks a basic question: how much training is required to beat a universal source coder? Traditionally, there have been two types of source coders: fixed, optimum coders such as Huffman coders; and universal source coders, such as Lempel-Ziv The paper considers a third type of source coders: learned coders. These are coders that are trained on data of a particular type, and then used to encode new data of that type. This is a type of coder that has recently become very popular for (lossy) image and video coding. The paper consider two criteria for performance of learned coders: the average performance over training data, and a guaranteed performance over all training except for some error probability $P_e$. In both cases the coders are evaluated with respect to redundancy. The paper considers the IID binary case and binary Markov chains. In both cases it is shown that the amount of training data required is very moderate: to code sequences of length $l$ the amount of training data required to beat a universal source coder is $m=K\frac{l}{\log l}$, where the constant in front depends the case considered.