论文标题
使用共同信息的大时间序列中有效的时间模式挖掘 - 完整版本
Efficient Temporal Pattern Mining in Big Time Series Using Mutual Information -- Full Version
论文作者
论文摘要
从不断部署在不同环境中的IOT传感器范围内,越来越多的时间序列可获得。通过这些时间序列的开采时间模式可以获得重要的见解。与传统的模式挖掘不同,时间模式挖掘(TPM)将事件时间间隔添加到提取的模式中,使它们更具表现力,而牺牲了采矿时间的复杂性增加。现有的TPM方法不能扩展到大型数据集,或者仅在预处理的时间事件而不是时间序列上工作。本文介绍了我们从时间序列(FTPMF TS)方法中频繁的时间模式挖掘,该方法提供了:(1)端到端FTPMF TS过程将时间序列作为输入,并产生频繁的时间模式作为输出。 (2)使用有效的数据结构进行快速支持和置信度计算的有效分层时间模式图(HTPGM)算法,并采用有效的修剪技术来显着更快地采矿。 (3)HTPGM的大概版本,该版本使用共同信息,一种从信息理论中知道的数据相关性的度量,从搜索空间中进行修剪无主张的时间序列。 (4)广泛的实验评估表明,HTPGM在运行时和内存消耗中的表现优于基准,并且可以扩展到大数据集。近似HTPGM的数量级最多要比基线更快,并且记忆消耗少,同时保持高精度。
Very large time series are increasingly available from an ever wider range of IoT-enabled sensors deployed in different environments. Significant insights can be gained by mining temporal patterns from these time series. Unlike traditional pattern mining, temporal pattern mining (TPM) adds event time intervals into extracted patterns, making them more expressive at the expense of increased mining time complexity. Existing TPM methods either cannot scale to large datasets, or work only on pre-processed temporal events rather than on time series. This paper presents our Frequent Temporal Pattern Mining from Time Series (FTPMf TS) approach which provides: (1) The end-to-end FTPMf TS process taking time series as input and producing frequent temporal patterns as output. (2) The efficient Hierarchical Temporal Pattern Graph Mining (HTPGM) algorithm that uses efficient data structures for fast support and confidence computation, and employs effective pruning techniques for significantly faster mining. (3) An approximate version of HTPGM that uses mutual information, a measure of data correlation known from information theory, to prune unpromising time series from the search space. (4) An extensive experimental evaluation showing that HTPGM outperforms the baselines in runtime and memory consumption, and can scale to big datasets. The approximate HTPGM is up to two orders of magnitude faster and less memory consuming than the baselines, while retaining high accuracy.