通过动态梯度聚集的联合转移学习

论文标题

通过动态梯度聚集的联合转移学习

Federated Transfer Learning with Dynamic Gradient Aggregation

论文作者

Dimitriadis, Dimitrios, Kumatani, Kenichi, Gmyr, Robert, Gaur, Yashesh, Eskimez, Sefik Emre

论文摘要

在本文中，引入了联合学习（FL）模拟平台。目标场景是基于此平台的声学模型培训。据我们所知，由于固有的复杂性，这是将FL技术应用于语音识别任务的首次尝试。拟议的FL平台可以根据采用的模块化设计支持不同的任务。作为平台的一部分，提出了一种新型的层次优化方案和两种梯度聚集方法，与其他分布式或FL训练算法（如BMUF和FedAvg）相比，训练收敛速度的数量级几乎提高了。除了增强的收敛速度外，分层优化在训练管道中还提供了额外的灵活性。在层次优化的基础上，基于数据驱动的权重推断，提出了动态梯度聚合算法。该汇总算法是梯度质量的常规化合物。最后，针对FL的无监督培训管道被视为单独的培训场景。提出的系统的实验验证基于两个任务：首先，与基线结果相比，LibrisPeech任务显示7倍和6％的单词错误率（WERR）。第二个任务是基于会议改编，可改善20％的WERR，而不是竞争性生产的LAS模型。提出的联合学习系统显示出在收敛速度和整体模型性能中均优于分布式培训的黄金标准。

In this paper, a Federated Learning (FL) simulation platform is introduced. The target scenario is Acoustic Model training based on this platform. To our knowledge, this is the first attempt to apply FL techniques to Speech Recognition tasks due to the inherent complexity. The proposed FL platform can support different tasks based on the adopted modular design. As part of the platform, a novel hierarchical optimization scheme and two gradient aggregation methods are proposed, leading to almost an order of magnitude improvement in training convergence speed compared to other distributed or FL training algorithms like BMUF and FedAvg. The hierarchical optimization offers additional flexibility in the training pipeline besides the enhanced convergence speed. On top of the hierarchical optimization, a dynamic gradient aggregation algorithm is proposed, based on a data-driven weight inference. This aggregation algorithm acts as a regularizer of the gradient quality. Finally, an unsupervised training pipeline tailored to FL is presented as a separate training scenario. The experimental validation of the proposed system is based on two tasks: first, the LibriSpeech task showing a speed-up of 7x and 6% Word Error Rate reduction (WERR) compared to the baseline results. The second task is based on session adaptation providing an improvement of 20% WERR over a competitive production-ready LAS model. The proposed Federated Learning system is shown to outperform the golden standard of distributed training in both convergence speed and overall model performance.

下载PDF全文

下载文献需遵守相关版权规定

论文标题