DEHIN：一个分散的框架，用于嵌入大规模异构信息网络

论文标题

DEHIN：一个分散的框架，用于嵌入大规模异构信息网络

DeHIN: A Decentralized Framework for Embedding Large-scale Heterogeneous Information Networks

论文作者

Imran, Mubashir, Yin, Hongzhi, Chen, Tong, Huang, Zi, Zheng, Kai

论文摘要

通过提取和从异质信息网络（HINS）剥削高阶信息来建模异质性，最近引起了极大的研究关注。这种异质网络嵌入（HNE）方法有效地利用了小尺度呼吸的异质性。但是，在现实世界中，随着新节点和不同类型的链接的连续引入，呼吸的大小呈指数增长，使其成为十亿个尺度的网络。在此类HIN上学习节点嵌入为通常集中的现有HNE方法创建了性能瓶颈，即完整的数据和模型都在一台机器上。为了解决具有强大效率和有效性保证的大规模HNE任务，我们在本文中介绍\ textit {分散的嵌入式嵌入框架}（DEHIN）。在DeHin中，我们生成了一个分布式并行管道，该管道利用超图以将并行化注入HNE任务中。 Dehin提出了一种保留分区机制的环境，该机制是创新的，将大型呈现为超图，其超蛋白具有语义上相似的节点。然后，我们的框架采用了分散的策略，通过采用树状管道来有效地分割Hins。然后，将每个结果的子网格分配给分布式工作者，该工人采用深度信息最大化定理来从其收到的分区中学习局部节点嵌入。我们进一步设计了一种新颖的嵌入对齐方案，以精确地投影从所有子网络中的所有子网络独立学习到共同的向量空间上，从而允许下游任务等下游任务，例如链接预测和节点分类。

Modeling heterogeneity by extraction and exploitation of high-order information from heterogeneous information networks (HINs) has been attracting immense research attention in recent times. Such heterogeneous network embedding (HNE) methods effectively harness the heterogeneity of small-scale HINs. However, in the real world, the size of HINs grow exponentially with the continuous introduction of new nodes and different types of links, making it a billion-scale network. Learning node embeddings on such HINs creates a performance bottleneck for existing HNE methods that are commonly centralized, i.e., complete data and the model are both on a single machine. To address large-scale HNE tasks with strong efficiency and effectiveness guarantee, we present \textit{Decentralized Embedding Framework for Heterogeneous Information Network} (DeHIN) in this paper. In DeHIN, we generate a distributed parallel pipeline that utilizes hypergraphs in order to infuse parallelization into the HNE task. DeHIN presents a context preserving partition mechanism that innovatively formulates a large HIN as a hypergraph, whose hyperedges connect semantically similar nodes. Our framework then adopts a decentralized strategy to efficiently partition HINs by adopting a tree-like pipeline. Then, each resulting subnetwork is assigned to a distributed worker, which employs the deep information maximization theorem to locally learn node embeddings from the partition it receives. We further devise a novel embedding alignment scheme to precisely project independently learned node embeddings from all subnetworks onto a common vector space, thus allowing for downstream tasks like link prediction and node classification.

下载PDF全文

下载文献需遵守相关版权规定

论文标题