基于公用事业的图形汇总：新的和改进

论文标题

基于公用事业的图形汇总：新的和改进

Utility-Based Graph Summarization: New and Improved

论文作者

Hajiabadi, Mahdi, Singh, Jasbir, Srinivasan, Venkatesh, Thomo, Alex

论文摘要

图挖掘中的一个基本挑战是数据集的尺寸不断增加。图摘要旨在找到一个紧凑的表示，从而产生更快的算法和减少的存储需求。图摘要的另一面是效用损失，它降低了其可用性。我们在本文中解决的关键问题是：（1）如何汇总图形而不会丢失实用程序？（2）如何总结一定效用但超过用户指定阈值的图形？（3）如何在没有图形重建的情况下查询图形摘要？}我们还旨在通过仅使用消费级机器有效处理Web尺度图来使群众可用图形摘要。以前的作品遭受概念上的局限性和缺乏可扩展性的困扰。在这项工作中，我们做出了三个关键的贡献。首先，我们基于集团和独立的集合分解提出了一种公用事业驱动的图形摘要方法，该方法会产生明显的压缩，而效用为零。在无损图汇总中，提供的压缩明显好于最新的压缩，而运行时则低两个数量级。其次，我们为有损情况提出了一种高度可扩展的算法，该算法已预言了颠覆以前的工作的昂贵迭代过程。我们的算法通过结合减少内存技术和一种新颖的二进制搜索方法来实现这一目标。与竞争相比，我们能够在单台计算机中处理网络尺度图，而无需性能障碍，因为公用事业阈值（汇总的大小）减小。第三，我们表明我们的图形摘要可以用作IS来回答几个重要类别的查询类别，例如三角枚举，pagerank和最短路径。这与其他逐步重建原始图的作品相反，以回答查询，从而产生了额外的时间成本。

A fundamental challenge in graph mining is the ever-increasing size of datasets. Graph summarization aims to find a compact representation resulting in faster algorithms and reduced storage needs. The flip side of graph summarization is the loss of utility which diminishes its usability. The key questions we address in this paper are: (1)How to summarize a graph without any loss of utility? (2)How to summarize a graph with some loss of utility but above a user-specified threshold? (3)How to query graph summaries without graph reconstruction?} We also aim at making graph summarization available for the masses by efficiently handling web-scale graphs using only a consumer-grade machine. Previous works suffer from conceptual limitations and lack of scalability. In this work, we make three key contributions. First, we present a utility-driven graph summarization method, based on a clique and independent set decomposition, that produces significant compression with zero loss of utility. The compression provided is significantly better than state-of-the-art in lossless graph summarization, while the runtime is two orders of magnitude lower. Second, we present a highly scalable algorithm for the lossy case, which foregoes the expensive iterative process that hampers previous work. Our algorithm achieves this by combining a memory reduction technique and a novel binary-search approach. In contrast to the competition, we are able to handle web-scale graphs in a single machine without a performance impediment as the utility threshold (and size of summary) decreases. Third, we show that our graph summaries can be used as-is to answer several important classes of queries, such as triangle enumeration, Pagerank, and shortest paths. This is in contrast to other works that incrementally reconstruct the original graph for answering queries, thus incurring additional time costs.

下载PDF全文

下载文献需遵守相关版权规定

论文标题