基于黑森西亚的概括对深度神经网络的强大微调保证

论文标题

基于黑森西亚的概括对深度神经网络的强大微调保证

Robust Fine-Tuning of Deep Neural Networks with Hessian-based Generalization Guarantees

论文作者

Ju, Haotian, Li, Dongyue, Zhang, Hongyang R.

论文摘要

我们考虑在目标任务上微调一个预处理的深神经网络。我们研究了微调的概括特性，以了解过度拟合的问题（例如，当目标数据集很小或训练标签嘈杂时）。深网的现有概括度量取决于诸如距初始化模型的初始化（即验证网络）的概念，以及深网的噪声稳定性。本文通过PAC-Bayesian分析确定了基于Hessian的距离度量，该分析与观察到的微调模型的概括差距很好地相关。从理论上讲，我们证明了基于精细模型的基于Hessian距离的概括界。我们还描述了针对标签噪声进行微调的扩展研究，在该研究中，过度拟合仍然是一个关键问题。我们在类别独立噪声模型下提出了该算法的算法和该算法的概括错误保证。从经验上讲，我们观察到，基于Hessian的距离度量可以符合实际模型在实践中观察到的概括差距的规模。我们还在具有嘈杂的训练标签的几个图像分类任务上测试了我们的算法，显示了先前方法的增长，并减少了微型模型的Hessian距离度量。

We consider fine-tuning a pretrained deep neural network on a target task. We study the generalization properties of fine-tuning to understand the problem of overfitting, which has often been observed (e.g., when the target dataset is small or when the training labels are noisy). Existing generalization measures for deep networks depend on notions such as distance from the initialization (i.e., the pretrained network) of the fine-tuned model and noise stability properties of deep networks. This paper identifies a Hessian-based distance measure through PAC-Bayesian analysis, which is shown to correlate well with observed generalization gaps of fine-tuned models. Theoretically, we prove Hessian distance-based generalization bounds for fine-tuned models. We also describe an extended study of fine-tuning against label noise, where overfitting remains a critical problem. We present an algorithm and a generalization error guarantee for this algorithm under a class conditional independent noise model. Empirically, we observe that the Hessian-based distance measure can match the scale of the observed generalization gap of fine-tuned models in practice. We also test our algorithm on several image classification tasks with noisy training labels, showing gains over prior methods and decreases in the Hessian distance measure of the fine-tuned model.

下载PDF全文

下载文献需遵守相关版权规定

论文标题