代表：通过适应辅助任务的梯度大小来改善多任务建议

论文标题

代表：通过适应辅助任务的梯度大小来改善多任务建议

MetaBalance: Improving Multi-Task Recommendations via Adapting Gradient Magnitudes of Auxiliary Tasks

论文作者

He, Yun, Feng, Xue, Cheng, Cheng, Ji, Geng, Guo, Yunsong, Caverlee, James

论文摘要

在许多个性化的建议方案中，可以通过学习其他辅助任务以及在多任务网络上的此目标任务以及此目标任务来提高目标任务的概括能力。但是，这种方法通常患有严重的优化不平衡问题。一方面，一个或多个辅助任务可能比目标任务具有更大的影响，甚至主导网络权重，从而使目标任务的建议准确性更差。另一方面，一个或多个辅助任务的影响可能太弱，无法协助目标任务。更具挑战性的是，这种不平衡在整个训练过程中动态变化，并且在同一网络的整个部分中都有不同。我们提出了一种新的方法：通过直接操纵其梯度W.R.T在多任务网络中的共享参数来平衡辅助损失。具体而言，在每次训练迭代和网络的每个部分中，辅助损失的梯度被仔细减少或扩大，以使目标损失的梯度更加近距离，从而防止了辅助任务如此强大以使目标任务如此强大或太虚弱，以至于无法帮助目标任务。此外，可以灵活地调整梯度幅度之间的接近度以适应不同情况。该实验表明，我们提出的方法在两个现实世界数据集上最强的基线上的NDCG@10方面取得了8.34％的显着提高。可以在此处找到我们方法的代码：https：//github.com/facebookresearch/metabalance

In many personalized recommendation scenarios, the generalization ability of a target task can be improved via learning with additional auxiliary tasks alongside this target task on a multi-task network. However, this method often suffers from a serious optimization imbalance problem. On the one hand, one or more auxiliary tasks might have a larger influence than the target task and even dominate the network weights, resulting in worse recommendation accuracy for the target task. On the other hand, the influence of one or more auxiliary tasks might be too weak to assist the target task. More challenging is that this imbalance dynamically changes throughout the training process and varies across the parts of the same network. We propose a new method: MetaBalance to balance auxiliary losses via directly manipulating their gradients w.r.t the shared parameters in the multi-task network. Specifically, in each training iteration and adaptively for each part of the network, the gradient of an auxiliary loss is carefully reduced or enlarged to have a closer magnitude to the gradient of the target loss, preventing auxiliary tasks from being so strong that dominate the target task or too weak to help the target task. Moreover, the proximity between the gradient magnitudes can be flexibly adjusted to adapt MetaBalance to different scenarios. The experiments show that our proposed method achieves a significant improvement of 8.34% in terms of NDCG@10 upon the strongest baseline on two real-world datasets. The code of our approach can be found at here: https://github.com/facebookresearch/MetaBalance

下载PDF全文

下载文献需遵守相关版权规定

论文标题