论文标题
Gimlps:MLP中具有抑制机制的门
giMLPs: Gate with Inhibition Mechanism in MLPs
论文作者
论文摘要
本文提出了一种新的模型架构,具有抑制MLP(GIMLP)的门。对CyClemlp(GI-CyClemLP)抑制的门可以在Imagenet分类任务上产生同等的性能,并且还改善了Bert,Roberta和Debertav3模型,根据两种新型技术。第一个是门控MLP,其中MLP和Trunk注意力输入中的矩阵乘积在进一步调整模型的适应性中。第二个是抑制作用,它抑制或增强分支调节,并且随着抑制水平的增加,它提供了更大的肌肉特征限制。我们表明,就图像网分类的精度而言,抑制水平较低的GicyClemLP可能与原始CyClemlp具有竞争力。此外,我们还通过一项全面的经验研究表明,这些技术显着改善了微调NLU下游任务的性能。至于在Deberta(Gideberta)微调上具有抑制MLP的大门,我们发现它可以在NLU任务的大多数方面取得吸引力的结果,而无需再进行任何额外的预处理。我们还发现,通过抑制栅极的使用,激活函数应具有短而光滑的负尾巴,而无关紧要的特征或损害模型的特征可以适度抑制。对图像分类的抑制作用,以及增强自然语言微调的能力,没有任何额外的预处理,对Imagenet和十二个语言下游任务的实验证明了Gate的有效性。
This paper presents a new model architecture, gate with inhibition MLP (giMLP).The gate with inhibition on CycleMLP (gi-CycleMLP) can produce equal performance on the ImageNet classification task, and it also improves the BERT, Roberta, and DeBERTaV3 models depending on two novel techniques. The first is the gating MLP, where matrix multiplications between the MLP and the trunk Attention input in further adjust models' adaptation. The second is inhibition which inhibits or enhances the branch adjustment, and with the inhibition levels increasing, it offers models more muscular features restriction. We show that the giCycleMLP with a lower inhibition level can be competitive with the original CycleMLP in terms of ImageNet classification accuracy. In addition, we also show through a comprehensive empirical study that these techniques significantly improve the performance of fine-tuning NLU downstream tasks. As for the gate with inhibition MLPs on DeBERTa (giDeBERTa) fine-tuning, we find it can achieve appealing results on most parts of NLU tasks without any extra pretraining again. We also find that with the use of Gate With Inhibition, the activation function should have a short and smooth negative tail, with which the unimportant features or the features that hurt models can be moderately inhibited. The experiments on ImageNet and twelve language downstream tasks demonstrate the effectiveness of Gate With Inhibition, both for image classification and for enhancing the capacity of nature language fine-tuning without any extra pretraining.