对结构化数据的层次结构敏感散列：调查

论文标题

对结构化数据的层次结构敏感散列：调查

Hierarchical Locality Sensitive Hashing for Structured Data: A Survey

论文作者

Wu, Wei, Li, Bin

论文摘要

数据相似性（或距离）计算是一个基本的研究主题，它培养了各种基于相似性的机器学习和数据挖掘应用程序。在大数据分析中，由于高计算成本，计算数据实例的确切相似性是不切实际的。为此，已经提出了局部性敏感的哈希（LSH）技术，以在没有学习过程的情况下以有效的方式为设置或向量之间的各种相似性度量提供准确的估计量。结构化数据（例如，由元素和元素之间的关系组成的序列，树木和图）通常在现实世界中看到，但是传统的LSH算法无法保留表示为元素之间关系的结构信息。为了征服这个问题，研究人员专门针对层次LSH算法的家族。在本文中，我们从以下角度探讨了研究对层次LSH的当前进展：1）数据结构，我们在其中回顾了三种典型数据结构的各种层次结构LSH算法并揭示其固有的联系； 2）应用程序，我们在多个应用程序方案中查看层次LSH算法； 3）挑战，我们将讨论一些潜在的挑战作为未来的方向。

Data similarity (or distance) computation is a fundamental research topic which fosters a variety of similarity-based machine learning and data mining applications. In big data analytics, it is impractical to compute the exact similarity of data instances due to high computational cost. To this end, the Locality Sensitive Hashing (LSH) technique has been proposed to provide accurate estimators for various similarity measures between sets or vectors in an efficient manner without the learning process. Structured data (e.g., sequences, trees and graphs), which are composed of elements and relations between the elements, are commonly seen in the real world, but the traditional LSH algorithms cannot preserve the structure information represented as relations between elements. In order to conquer the issue, researchers have been devoted to the family of the hierarchical LSH algorithms. In this paper, we explore the present progress of the research into hierarchical LSH from the following perspectives: 1) Data structures, where we review various hierarchical LSH algorithms for three typical data structures and uncover their inherent connections; 2) Applications, where we review the hierarchical LSH algorithms in multiple application scenarios; 3) Challenges, where we discuss some potential challenges as future directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题