最大程度地减少分类错误概率的最佳压缩：信息理论方法

论文标题

最大程度地减少分类错误概率的最佳压缩：信息理论方法

Optimal Compression for Minimizing Classification Error Probability: an Information-Theoretic Approach

论文作者

Gao, Jingchao, Tang, Ao, Xu, Weiyu

论文摘要

我们在约束条件下制定了执行最佳数据压缩的问题，即压缩数据可用于机器学习中的准确分类。我们表明，这意味着在使用压缩数据进行机器学习时，在分类错误概率的约束下，在数据及其压缩版本之间的相互信息最小化的问题都很小。然后，我们提供分析和计算方法，以表征数据压缩和分类误差概率之间的最佳权衡。首先，我们为带有二进制标签的数据提供了最佳压缩策略的分析表征。其次，对于具有多个标签的数据，我们制定了一组凸优化问题来表征最佳权衡，从中可以通过数值解决公式化的优化问题来从分类误差和压缩效率之间的最佳权衡。我们进一步展示了我们的配方对分类性能中信息底层方法的改进。

We formulate the problem of performing optimal data compression under the constraints that compressed data can be used for accurate classification in machine learning. We show that this translates to a problem of minimizing the mutual information between data and its compressed version under the constraint on error probability of classification is small when using the compressed data for machine learning. We then provide analytical and computational methods to characterize the optimal trade-off between data compression and classification error probability. First, we provide an analytical characterization for the optimal compression strategy for data with binary labels. Second, for data with multiple labels, we formulate a set of convex optimization problems to characterize the optimal tradeoff, from which the optimal trade-off between the classification error and compression efficiency can be obtained by numerically solving the formulated optimization problems. We further show the improvements of our formulations over the information-bottleneck methods in classification performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题