论文标题
从大树数据中发现封闭的最大嵌入式图案
Discovering Closed and Maximal Embedded Patterns from Large Tree Data
论文作者
论文摘要
我们解决了总结从大数据树中提取的嵌入树模式的问题。我们通过定义和采矿封闭和最大嵌入的无序树模式来做到这一点。我们设计了一种使用局部封闭度检查技术扩展的嵌入式频繁模式挖掘算法。该算法称为{\ em lossembtm-prune},因为它急切地消除了非关闭模式。为了减轻中间模式的生成,我们设计了模式搜索空间修剪规则,以主动检测和修剪模式搜索空间中的分支,而该搜索空间与封闭的模式不相对。修剪规则被容纳到扩展的嵌入式图案矿工中,以生成一种新算法,称为{\ em nocleembtm-prune},用于挖掘所有封闭和最大嵌入式频繁的频繁模式,从大型数据树中。我们对合成和真实大树数据集进行的广泛实验表明,在密集的数据集上,{\ em nocleembtm-prune}不仅会产生一个完整的闭合和最大图案集,该集合比嵌入式模式矿工产生的大小要小得多,而且在模式的盖帽上越来越快。
We address the problem of summarizing embedded tree patterns extracted from large data trees. We do so by defining and mining closed and maximal embedded unordered tree patterns from a single large data tree. We design an embedded frequent pattern mining algorithm extended with a local closedness checking technique. This algorithm is called {\em closedEmbTM-prune} as it eagerly eliminates non-closed patterns. To mitigate the generation of intermediate patterns, we devise pattern search space pruning rules to proactively detect and prune branches in the pattern search space which do not correspond to closed patterns. The pruning rules are accommodated into the extended embedded pattern miner to produce a new algorithm, called {\em closedEmbTM-prune}, for mining all the closed and maximal embedded frequent patterns from large data trees. Our extensive experiments on synthetic and real large-tree datasets demonstrate that, on dense datasets, {\em closedEmbTM-prune} not only generates a complete closed and maximal pattern set which is substantially smaller than that generated by the embedded pattern miner, but also runs much faster with negligible overhead on pattern pruning.