基于变压器的多源域适应

论文标题

基于变压器的多源域适应

Transformer Based Multi-Source Domain Adaptation

论文作者

Wright, Dustin, Augenstein, Isabelle

论文摘要

在实用的机器学习设置中，模型必须做出预测的数据通常来自与训练数据不同的分布。在这里，我们研究了无监督的多源域适应性的问题，在该域中对模型进行了来自多个源域的标记数据训练，并且必须对未看到标记数据的域进行预测。与CNN和RNN的先前工作证明了专家混合的好处，其中将多个领域专家分类器的预测组合在一起；以及域对抗训练，以诱导域的不可知代表空间。受此启发，我们研究了如何将这种方法有效地应用于大型的经过预告片的变压器模型。我们发现，域对抗训练对这些模型的学习表现有影响，同时对其性能几乎没有影响，这表明大型基于变压器的模型已经在范围内相对强大。此外，我们表明，专家的混合物通过比较混合功能的几种变体（包括一种基于注意力的新型混合物）来带来显着的性能改善。最后，我们证明了大型基于变压器的域专家的预测是高度同质的，这使得学习有效的功能以使其混合预测的有效功能具有挑战性。

In practical machine learning settings, the data on which a model must make predictions often come from a different distribution than the data it was trained on. Here, we investigate the problem of unsupervised multi-source domain adaptation, where a model is trained on labelled data from multiple source domains and must make predictions on a domain for which no labelled data has been seen. Prior work with CNNs and RNNs has demonstrated the benefit of mixture of experts, where the predictions of multiple domain expert classifiers are combined; as well as domain adversarial training, to induce a domain agnostic representation space. Inspired by this, we investigate how such methods can be effectively applied to large pretrained transformer models. We find that domain adversarial training has an effect on the learned representations of these models while having little effect on their performance, suggesting that large transformer-based models are already relatively robust across domains. Additionally, we show that mixture of experts leads to significant performance improvements by comparing several variants of mixing functions, including one novel mixture based on attention. Finally, we demonstrate that the predictions of large pretrained transformer based domain experts are highly homogenous, making it challenging to learn effective functions for mixing their predictions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题