论文标题

分布式层次的GPU参数服务器,用于大规模深度学习广告系统

Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems

论文作者

Zhao, Weijie, Xie, Deping, Jia, Ronglai, Qian, Yulei, Ding, Ruiquan, Sun, Mingming, Li, Ping

论文摘要

ADS系统的神经网络通常从多个资源中获取输入,例如查询 - AD相关性,AD功能和用户肖像。这些输入被编码为一hot或多热的二进制特征,每个示例通常只有一小部分非零特征值。在线广告行业中的深度学习模型可以具有不适合GPU内存中的Terabyte级参数,也不适合计算节点上的CPU主内存。例如,赞助的在线广告系统可以包含$ 10^{11} $稀疏功能,使神经网络成为具有约10个TB参数的大型模型。在本文中,我们引入了一个分布式的GPU层次参数服务器,用于大规模的深度学习广告系统。我们提出了一个分层工作流,该工作流利用GPU高带宽内存,CPU主内存和SSD作为3层分层存储。所有神经网络训练计算都包含在GPU中。关于现实世界数据的广泛实验证实了拟议系统的有效性和可扩展性。一个4节点层次的GPU参数服务器可以比MPI群集中的150个节点内分布式参数服务器快2倍以上的训练模型。此外,我们所提出的系统的价格绩效比是MPI群集解决方案的4-9倍。

Neural networks of ads systems usually take input from multiple resources, e.g., query-ad relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot binary features, with typically only a tiny fraction of nonzero feature values per example. Deep learning models in online advertising industries can have terabyte-scale parameters that do not fit in the GPU memory nor the CPU main memory on a computing node. For example, a sponsored online advertising system can contain more than $10^{11}$ sparse features, making the neural network a massive model with around 10 TB parameters. In this paper, we introduce a distributed GPU hierarchical parameter server for massive scale deep learning ads systems. We propose a hierarchical workflow that utilizes GPU High-Bandwidth Memory, CPU main memory and SSD as 3-layer hierarchical storage. All the neural network training computations are contained in GPUs. Extensive experiments on real-world data confirm the effectiveness and the scalability of the proposed system. A 4-node hierarchical GPU parameter server can train a model more than 2X faster than a 150-node in-memory distributed parameter server in an MPI cluster. In addition, the price-performance ratio of our proposed system is 4-9 times better than an MPI-cluster solution.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源