论文标题
有效有效的可变长度数据系列分析
Effective and Efficient Variable-Length Data Series Analytics
论文作者
论文摘要
在过去的二十年中,数据系列相似性搜索已成为与数据系列集合有关的几个分析任务和应用程序的核心操作。许多针对不同采矿问题的解决方案通过相似性搜索起作用。在这方面,所有提出的解决方案都需要对执行相似性搜索的系列长度的先验知识。在某些情况下,长度的选择至关重要,并且明智地影响了预期结果的质量。不幸的是,在给定范围内为所有长度提供结果的明显蛮力解决方案在计算上是站不住脚的。在此博士学位工作,我们提出了第一个解决方案,该解决方案本质地支持数据系列中可扩展和可变的长度相似性搜索,该解决方案应用于序列/子序列匹配,基序和分歧发现问题。实验结果表明,我们的方法比其他替代方案快得多。他们还表明,我们可以使用预定义的长度来消除执行分析的不切实际的约束,从而导致更直观和可行的结果,否则这将被遗漏。
In the last twenty years, data series similarity search has emerged as a fundamental operation at the core of several analysis tasks and applications related to data series collections. Many solutions to different mining problems work by means of similarity search. In this regard, all the proposed solutions require the prior knowledge of the series length on which similarity search is performed. In several cases, the choice of the length is critical and sensibly influences the quality of the expected outcome. Unfortunately, the obvious brute-force solution, which provides an outcome for all lengths within a given range is computationally untenable. In this Ph.D. work, we present the first solutions that inherently support scalable and variable-length similarity search in data series, applied to sequence/subsequences matching, motif and discord discovery problems.The experimental results show that our approaches are up to orders of magnitude faster than the alternatives. They also demonstrate that we can remove the unrealistic constraint of performing analytics using a predefined length, leading to more intuitive and actionable results, which would have otherwise been missed.