两阶段微调：学习类失水数据的新型策略

论文标题

两阶段微调：学习类失水数据的新型策略

Two-Stage Fine-Tuning: A Novel Strategy for Learning Class-Imbalanced Data

论文作者

ValizadehAslani, Taha, Shi, Yiwen, Wang, Jing, Ren, Ping, Zhang, Yi, Hu, Meng, Zhao, Liang, Liang, Hualou

论文摘要

长尾分布式数据的分类是一个具有挑战性的问题，它遭受了严重的班级不平衡，因此只有几个样本的尾巴阶级表现不佳。由于样本的匮乏，在将预审计的模型转移到下游任务时，在尾部类中学习对于微调尤其具有挑战性。在这项工作中，我们简单地修改了标准微调以应对这些挑战。具体而言，我们提出了一个两阶段的微调：我们首先用校准平衡的重新加权损失微调了验证模型的最后一层，然后我们执行标准的微调。我们的修改有几个好处：（1）仅通过微调模型参数的一小部分，同时保持其余部分未触及，从而利用了预估计的表示；（2）它允许模型学习特定任务的初始表示；重要的是（3）它可以保护学习尾巴的学习免于模型更新期间处于不利地位。我们对文本分类的两类和多级任务的合成数据集进行了广泛的实验，以及用于ADME的现实世界应用（即吸收，分布，代谢和排泄）语义标记。实验结果表明，所提出的两阶段微调既优于传统损失，又超过微调，并且在上述数据集上重新恢复损失。

Classification on long-tailed distributed data is a challenging problem, which suffers from serious class-imbalance and hence poor performance on tail classes with only a few samples. Owing to this paucity of samples, learning on the tail classes is especially challenging for the fine-tuning when transferring a pretrained model to a downstream task. In this work, we present a simple modification of standard fine-tuning to cope with these challenges. Specifically, we propose a two-stage fine-tuning: we first fine-tune the final layer of the pretrained model with class-balanced reweighting loss, and then we perform the standard fine-tuning. Our modification has several benefits: (1) it leverages pretrained representations by only fine-tuning a small portion of the model parameters while keeping the rest untouched; (2) it allows the model to learn an initial representation of the specific task; and importantly (3) it protects the learning of tail classes from being at a disadvantage during the model updating. We conduct extensive experiments on synthetic datasets of both two-class and multi-class tasks of text classification as well as a real-world application to ADME (i.e., absorption, distribution, metabolism, and excretion) semantic labeling. The experimental results show that the proposed two-stage fine-tuning outperforms both fine-tuning with conventional loss and fine-tuning with a reweighting loss on the above datasets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题